Methods using DNA methylation for identifying a cell or a mixture of cells for prognosis and  diagnosis of diseases, and for cell remediation therapies

ABSTRACT

Methods using DNA Methylation arrays are provided for identifying a cell or mixture of cells and for quantification of alterations in distribution of cells in blood or in tissues, and for diagnosing, prognosing and treating disease conditions, particularly cancer. The methods use fresh and archival samples.

RELATED APPLICATIONS

This application claims the benefit of U.S. provisional application Ser.No. 61/865,479 filed Aug. 13, 2013, entitled, “Methods using DNAmethylation for identifying a cell or a mixture of cells for prognosisand diagnosis of diseases, and for cell remediation therapies”, and is acontinuation-in-part of international application numberPCT/US2012/39699 filed May 25, 2012, entitled, “Methods using DNAMethylation for identifying a cell or a mixture of cells for prognosisand diagnosis of diseases, and for cell remediation therapies” whichclaims the benefit of provisional applications having Ser. Nos.61/489,883 filed May 25, 2011 entitled, “Methods of Immunodiagnosticsusing DNA Methylation arrays as surrogate measures of the identity of acell or a mixture of cells”; 61/509,644 filed Jul. 20, 2011 entitled,“Methods of Immunodiagnostics using DNA Methylation arrays as surrogatemeasures of the identity of a cell or a mixture of cells for prognosisand diagnosis of diseases”; 61/585,892 filed Jan. 12, 2012 entitled,“Methods of Immunodiagnostics using DNA Methylation arrays as surrogatemeasures of the identity of a cell or a mixture of cells for prognosisand diagnosis of diseases”; and 61/619,663, filed Apr. 3, 2012 entitled,“Methods using DNA Methylation arrays for identifying a cell or amixture of cells for prognosis and diagnosis of diseases, and for cellremediation therapies”, inventors Karl Kelsey, Eugene Andres Houseman,John Wiencke, William P. Accomando, Jr. and Carmen Marsit, of which eachpatent application is hereby incorporated by reference herein in itsentirety.

GOVERNMENT SUPPORT

This invention was made with government support under grantsR01CA126831, R01CA52689, R01CA126939, R01CA121147, R01CA100679,R01CA078609, R01ES06717, R01MH094609 and P50-CA97257 awarded by theNational Institutes of Health. The government has certain rights in theinvention.

TECHNICAL FIELD

Methods of determining altered immune cell distribution to diagnose orprognose a disease condition based on determining DNA methylationsignatures of specific immune cell type of or mixture of immune cellstypes are provided.

BACKGROUND

Leukocytes, commonly called white blood cells, are cells that areprimarily responsible for mounting an immune response by a host topathogens and to foreign antigens. Leukocyte distribution is currentlydetermined by simple histologic or flow cytometric assessments. Thesemethods have significant limitations. In particular, flow cytometry islimited by the following: availability of fluorescent antibody tags,laborious nature of the antibody tagging process, and needs forseparation of cells requiring large volumes of fresh cells, expensivetechnology as well as equipment for detection of cells, and maintainingthe integrity of the outer membrane of the cells to preserve labileprotein epitopes. Further limitation of methods requiring fresh cells isthat the methods are not useful in situations in which prospectivestudies are impractical, such as in the case of rare diseases, in whichlarge numbers of disease subjects are not available. In these casesretrospective studies are needed to correlate disease outcome withdisease parameters. However, retrospective studies can be performed onlyif archival samples derived from archived cohort populations could beused to analyze the disease parameters. Currently there are no knownmethods in which archived samples from patients and normal subjectscould be used to provide a quantitative estimate of leukocytedistributions in disease conditions.

Thus there is a need for methods that provide quantification ofalterations in distribution of leukocytes in blood or tissues in diseaseconditions that do not rely upon fresh samples, that are not laborintensive and that do not use expensive technology or equipment.

SUMMARY

In diverse medical conditions such as in disease or in instances ofimmune-toxic exposure, the leukocyte distribution in blood or tissuescontains information about the underlying immune-biology of the medicalcondition which is useful for diagnosis, prognosis or treatment of themedical condition, or for monitoring response to therapy. Accordingly,an embodiment of the invention provides a method a method for assessinga disease condition in a subject, including: measuring a CD3Z positive Tlymphocyte cell number in a sample from the subject by analyzingmethylation in the sample of at least one CpG dinucleotide (CpG) in geneCD3Z or in an orthologous or a paralogous gene thereof, such that anamount of a demethylated C of the at least one CpG in the sample is ameasure of CD3+ T lymphocyte cell number; and comparing the amount ofthe demethylated C in the sample from the subject with that in positivecontrol samples from patients with the disease condition, and with thatin negative control samples from healthy subjects, such that the diseasecondition is selected from: an autoimmune disease, an allergy, atransplant rejection, obesity, an inherited disease, immunosuppressionand a cancer. As used herein “subject” refers to any animal, forexample, a mammal that is healthy or that has a disease condition forexample a human, or a high value agricultural animal or a zoo animal. A“patient” is a subject that either has a disease condition or is in needof obtaining a diagnosis of a disease condition.

A related embodiment of the method includes at least one of: monitoring,diagnosing, prognosing, and measuring response to therapy by comparingthe measured CD3+ T lymphocyte cell numbers in the subject after therapyto that in the patients with the disease condition and in the healthysubjects.

An embodiment of the method provides that the inherited disease is ananeuploidy. For example, aneuploidy is selected from trisomy 21,Turner's syndrome, and Klinefelter's syndrome.

The sample used in the method is a fresh sample. For example, the freshsample is freshly drawn blood, a tumor infiltrate or cells obtained froma lymph node puncture. Alternatively, the sample is an archival sample.For example, the archival sample is archival blood collected and storedon filter paper cards such as a Guthrie card, frozen blood specimens orfrozen tissue. Demethylation of DNA is a stable chemical modification ofDNA, and archival samples are used to measure cell numbers. Flowcytometry in contrast, requires fresh cells, for detection of cellsdepends on the availability of protein epitopes, which are labile andnot well preserved in archival samples.

In a related embodiment of the method the amount of the demethylated Cof the at least one CpG in the CD3Z gene in the sample is at least about80%, at least about 90%, or at least about 95% of the total amount ofthe CpG in CD3Z genes in the sample.

An embodiment of the method further involves analyzing the methylationof the CD3Z gene further by amplifying by Polymerase Chain Reaction(PCR) using primer pairs specific for amplification of specificdemethylated CpG loci. For example, amplification by PCR involvesmonitoring quantitative PCR in real time using a MethyLight assay orusing digital PCR. In various embodiments, the CpG loci are listedherein. For at least one gene or locus is selected from the group of:SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ IDNO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11,SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO: 14, SEQ ID NO:15, SEQ ID NO:16,SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21,SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26,SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31,SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO: 34, SEQ ID NO:35, SEQ ID NO:36,SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41,SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46,SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51,SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID NO:55, SEQ ID NO:56,SEQ ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60, SEQ ID NO:61,SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ ID NO:66,SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71,SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO: 74, SEQ ID NO:75, SEQ ID NO:76,SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID NO:80, SEQ ID NO:81,SEQ ID NO:82, SEQ ID NO:83, SEQ ID NO:84, SEQ ID NO:85, SEQ ID NO:86,SEQ ID NO:87, SEQ ID NO:88, SEQ ID NO:89, SEQ ID NO:90, SEQ ID NO:91,SEQ ID NO:92, SEQ ID NO:93, SEQ ID NO: 94, SEQ ID NO:95, SEQ ID NO:96,SEQ ID NO:119, SEQ ID NO:120, SEQ ID NO:121, SEQ ID NO:122, SEQ IDNO:123, SEQ ID NO:124, SEQ ID NO:125, SEQ ID NO:126, SEQ ID NO:127, SEQID NO:128, SEQ ID NO:129, SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:132,SEQ ID NO:133, SEQ ID NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQ IDNO:137, SEQ ID NO:138, SEQ ID NO:139, and SEQ ID NO:140. In variousembodiments, at least one locus is selected from the group consistingof: FGD2, HLA-DOB, BLK IGSF6, CLDN15, SFT2D3, ZNF22, CEL, HDC, GSG1,FCN1, OSBPL5, LDB2, NCR1, EPS8L3, CD3D, PPP6C, CD3G, TXK, and FAIM. Invarious embodiments, at least one locus is selected from the groupconsisting of: CLEC9A (2 loci), INPP5D, INHBE, UNQ473, SLC7A11, ZNF22,XYLB, HDC, RGR, SLCO2B1, C1orf54, TM4SF19, IGSF6, KRTHA6, CCL21,SLC11A1, FGD2, TCL1A, MGMT, CD19, LILRB4, VPREB3, FLJ10379, HLA-DOB,EPS8L3, SHANK1, CD3D (2 loci), CHRNA3, CD3G (2 loci), RARA, and GRASP.The nucleotide sequence and corresponding amino acid sequence of each ofthe genes or loci herein are listed and characterized in genome orprotein databases such as GenBank, European Nucleotide Archive, EuropeanBioinformatics Institute, GenomeNet, or The National Center forBiotechnology Information (NCBI) Protein database. The nucleotidesequences of the loci in computer readable form as an ASCII text file(114 kilobytes) created Nov. 25, 2013 entitled “SEQ_ID_(—)11252013”containing sequence listings numbers 1-140 has been electronically filedherewith and is incorporated by reference herein in its entirety. Invarious embodiments, each locus includes a portion of any of thesequences described herein.

An embodiment of the method further involves analyzing the methylationof the CD3Z gene by a method selected from the group of: Pyrosequencing,Methylation-sensitive single-nucleotide primer extension (Ms-SNuPE),Methylation-sensitive single stranded conformation analysis (MS-SSCA),and High resolution melting analysis (HRM) and digital PCR methodscomprising emulsion and nanofluidic partitioning. According to a relatedembodiment, Methylation-sensitive single-nucleotide primer extensionfurther includes: chemically converting the lymphocyte derived wholegenomic DNA with bisulfate; amplifying chemically converted wholegenomic DNA; enzymatically fragmenting resulting amplified DNA;hybridizing fragmented DNA to methylation sensitive CpG locus specificDNA oligomers; and labeling by single-base extension using fluorescentlylabeled nucleotides.

Another embodiment of the method further provides steps for analyzingmethylation of differentially methylated regions (DMRs) of gene FOXP3,using primer pairs for amplification of specific loci of demethylatedCpG in the FOXP3 gene. Within a gene “loci” as used herein refers tolocations of CpG dinucleotide containing sequences present in that gene,and only one or a few may be differentially demethylated in a specificcell.

A related embodiment of the method further includes: determining a ratioof CpG demethylation of FOXP3 gene DMR to the CpG demethylation of CD3Zgene DMR, in a sample of tumor infiltrate, such that the ratio involvesan index of T regulatory cell number to the total T cell number in theinfiltrate; and the method further involves diagnosing of a pathologicalgrade of the cancer, so that the index of T regulatory cell number tothe total T cell number in the tumor infiltrate correlates with thegrade of the cancer. In a related embodiment, the cancer is selectedfrom: a glioma; an ovarian cancer; a head and neck squamous cell cancer(HNSCC), breast cancer, lung cancer, prostate cancer, colon cancer,pancreatic cancer, bladder cancer, cervical cancer and liver cancer.

In a related embodiment the method further includes prognosing survivalof a patient having or needing a diagnosis of glioma or HNSCC, in whichamount of demethylation of CD3Z gene DMR in the patient as a percent oftotal DNA greater than a median value in a sample population of subjectscorrelates with a prognosis of poor survival.

An embodiment of the invention provides a kit for measuring CD3+ Tlymphocyte and FOXP3+ T regulatory cell numbers by analyzing methylationof CpG positions in CD3Z and FOXP3 genes, the kit having sequencing andPCR primers specific for the CD3Z and the FOXP3 gene DMRs andinstructions for analyzing and comparing the CpG methylation betweenhealthy subjects and a patient.

An embodiment provides a method for assessing a disease condition byestimating an alteration in proportions of types of leukocytes in asample from a subject, the method including the steps of: measuring aDNA methylation profile for each type of leukocyte and forunfractionated cells, such that DNA methylation profiles are obtainedfor a plurality of CpG loci, and obtaining the status of an individualCpG locus by amplifying DNA from each of the types of leukocyte and fromthe unfractionated cells, such that amplifying comprises hybridizingmethylation sensitive locus-specific DNA oligomers corresponding to eachCpG locus; ordering CpG loci by ability to distinguish types ofleukocytes, such that the ordering of the CpG loci determinesdifferentially methylated DNA regions (DMRs), such that obtaining DMRscomprises statistically minimizing introduction of bias in amount oftotal methylation status of a large number of CpG loci obtained from theunfractionated cells by employing a Bayesian treatment of priorprobabilities of the methylation status at each individual locus,thereby identifying a plurality of CpG loci to include in themeasurement, such that an amount of CpG loci distinguishes DMRsignatures among the types of leukocytes and minimizes bias; obtainingDNA methylation profiles comprising DMRs from the types of leukocytes,such that the DNA methylation profiles comprise validating measures ofrelative amounts of the types of leukocytes, and obtaining DNAmethylation profiles of the unfractionated cells as surrogate measuresof relative amounts of each leukocyte type in the unfractionated cells;employing an analog of a measurement error model wherein a DNAmethylation surrogate y is reverse formulated with respect to thedisease outcome z, as

y=ƒ(z),

such that y denotes a multivariate random variable representing amethylation profile, z denotes a disease outcome or state, and ƒ denotesa probability distribution; y, z, and leukocyte distribution, ω arerelated by the estimator equations,

E(y|ω)=g(ω), and

under an assumption E(z|ω,y)=E(z|ω), such that, E denotes an expectationof a random variable and ω denotes a subject specific distribution ofleukocytes; and, comparing relative amounts of each type of leukocyte inthe sample from the subject with those in a control sample, therebyproviding an assessment of the disease condition. In relatedembodiments, the locus-specific DNA oligomers are linked to an arrayselected from the group of: a glass slide array; a quartz slide array; afiber optic bundle array, a planar slide array, a micro-well array; amulti-well dish array; a digital PCR array; and a bead array havingbeads located at known addressable locations on the array. A relatedembodiment of the method further provides at least one of steps of:monitoring, diagnosing, prognosing and measuring response to therapy ofthe disease condition.

The method in a related embodiment further includes analyzingsensitivity for correcting bias, such that correcting bias is unrelatedto measurement error and is related to errors arising from unprofiledcell types and non-cell mediated profile differences. In relatedembodiments of the method, fractionated leukocyte types include at leastone selected from: CD19+ B lymphocytes, CD15+ granulocytes, CD14+monocytes, CD56+ Natural Killer cells, and CD3+ T lymphocytes.

In an embodiment of the method the disease condition is Head and NeckSquamous Cell Carcinoma (HNSCC).

An embodiment of the method provides that the inherited disease is ananeuploidy. For example, aneuploidy is selected from trisomy 21,Turner's syndrome, and Klinefelter's syndrome.

According to another embodiment of the method the control sample istaken from the subject at a different point in time for prognosis of thecourse of the disease condition in the subject. In another relatedembodiment, the method of assessing disease condition further includesafter employing the measurement model, comparing the distribution ofleukocytes to the relative amounts in the control sample as a normalstandard, such that the normal standard is a statistical measureobtained from a plurality of disease-free subjects.

In a related embodiment the method provides a diagnosis ofimmunosuppression due to smoking in a currently smoking subject by:determining a ratio of CpG demethylation of FOXP3 gene DMR to the CpGdemethylation of CD3Z gene DMR in blood in the currently smokingsubject, such that the ratio is an index of T regulatory cell number tothe total T cell number; and providing a diagnosis of immunosuppressionin the currently smoking subject, such that the value of the index of Tregulatory cell number to the total T cell number in the currentlysmoking subject, greater than the average value in a sample populationof currently non-smoking subjects correlates with immunosuppression dueto smoking. In a related embodiment of the method the subject with thecurrently-smoking or currently non-smoking status is a patient having acancer, an infection or in need of a transplant.

An embodiment provides a method of predicting a methylation classmembership in a bodily fluid sample of a subject for assessing diseasestatus of the subject, in which the methylation class membershipcorresponds to an epigenetic signature of a plurality of leukocytetypes, the method including: measuring amounts of DNA methylation ineach of a plurality of leukocyte type populations to determinedifferentially methylated regions (DMRs);

ranking leukocyte DMRs for each leukocyte type according to statisticalstrength of association of the DMR with each leukocyte type; randomlydividing a data set of control subjects and subjects with a disease intogroups having substantially the same numbers of control subjects andsubjects with the disease to obtain a training set and a testing set;clustering samples in the training set using a defined number of highestranked leukocyte DMRs to determine clustering solutions, in which aclustering solution corresponds to the methylation class membership; andpredicting methylation class membership for subjects within the testingset by applying the clustering solutions obtained from the training setto the highest ranked leukocyte DMRs in the testing set, such thatclinical utility of the predicted methylation class membership isdetermined by testing association of the predicted methylation classmembership with the disease status of the subject.

According to an embodiment of the method, the highest ranked leukocyteDMRs are as shown in Table 21, in which each DMR is identified bychromosomal location and gene name, and the defined number of highestranked leukocyte DMRs is selected from: least 10, at least 20, at least30, at least 40 and is 50.

The methylation class membership of the subject in the testing set ispredicted for example using a naïve Bayes classifier. Testing theassociation of the predicted methylation class with disease statusincludes for example using receiver operating characteristic curves(ROC) and the corresponding area under each curve.

The bodily fluid sample in some embodiments is a fresh sample, forexample freshly collected blood or a blood derivative. Alternatively,the bodily fluid is an archival sample, for example stored frozen bloodor archival blood collected and stored on a filter paper card such as aGuthrie card.

The method in a related embodiment includes at least one of: diagnosing,monitoring, prognosing and measuring response to therapy of the diseasestatus.

In related embodiments the leukocyte types are selected from the groupof: natural killer cells, B Cells, CD4+ T cells, CD8+ T cells,granulocytes and monocytes. The disease according to an embodiment ofthe method is exemplified by one of: head and neck squamous cellcarcinoma (HNSCC), ovarian cancer, and bladder cancer.

An array is provided as another embodiment for estimating proportions ofleukocyte types in a sample from a mammal for assessing a diseasecondition of the mammal by analyzing differential methylation of CpGdinucleotides in a plurality of genes of the sample, the arrayincluding: a plurality of DNA probes attached to a plurality of surfacesat known addressable locations on the array, such that the surface ateach location is attached to a DNA probe having a specific nucleotidesequence, such that the DNA probe having the specific nucleotidesequence hybridizes to a nucleotide sequence of a methylated form or anummethylated form of a CpG dinucleotide in a sequence of a gene of theplurality of genes in the sample, such that the array is selected fromhaving: at least 16 probes, at least 64 probes, at least 96 probes, andat least 384 probes.

The plurality of probes, in a related embodiment of the array, havenucleotide sequences that hybridize with a respective plurality of 118different nucleotide sequences which are found in nature occurring inthe plurality of genes. In another related embodiment, the plurality ofprobes include at least one of SEQ ID NO: 1 to SEQ ID NO: 96. In variousembodiments of the array, the plurality of probes have nucleotidesequences that hybridize with at least one gene or locus describedherein. For example, the at least one gene or locus is any of SEQ ID NO:1-140. In various embodiments, the at least one gene or locus isselected from the group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ IDNO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ IDNO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ IDNO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ IDNO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ IDNO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ IDNO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ IDNO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ IDNO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ IDNO: 54, SEQ ID NO:55, SEQ ID NO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ IDNO:59, SEQ ID NO:60, SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ IDNO:64, SEQ ID NO:65, SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ IDNO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ IDNO: 74, SEQ ID NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ IDNO:79, SEQ ID NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ IDNO:84, SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ IDNO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ IDNO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119, SEQ ID NO:120, SEQ IDNO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID NO:124, SEQ ID NO:125, SEQID NO:126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID NO:129, SEQ ID NO:130,SEQ ID NO:131, SEQ ID NO:132, SEQ ID NO:133, SEQ ID NO:134, SEQ IDNO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQ ID NO:138, SEQ ID NO:139, andSEQ ID NO:140.

In a related embodiment of the array, the addressable locations arewells of a substrate, such that the substrate is selected from: glassslide; quartz slide: fiber optic bundle and planar silica slides. Inanother related embodiment the surfaces included in the array areparticles added to the wells.

In alternative embodiments the addressable locations of the array aredefined spots on a glass slide or are microbeads or particles labeledwith a code. For example, the particles are microbeads in the form ofglass cylinders identifiable with inscribed holographic code.

In various embodiments the disease condition is selected from: anautoimmune disease, an allergy, a transplant rejection, obesity, aninherited disease, immunosuppression and a cancer.

Another embodiment provides a method for estimating proportions of typesof leukocytes in a sample from a subject for assessing a diseasecondition of the subject by analyzing differential methylation of CpGdinucleotides in a plurality of genes of the sample, the methodincluding: providing an array having a plurality of DNA probes attachedto a plurality of surfaces at known addressable locations on the array,such that the surface at each location is attached to a DNA probe havinga specific nucleotide sequence; reacting genomic DNA in the sample witha bisulfite reagent to convert unmethylated cytosine residues to uracil;hybridizing resulting bisulfite treated genomic DNA with the array toobtain resulting hybridized probes on the array, such that the DNAprobes hybridize to a DNA sequence of each of a methylated form and anummethylated form of a sequence having a CpG dinucleotide in a gene foreach of the plurality of genes; and detecting the methylation status ofeach of the CpG dinucleotides in each sequence, thereby estimatingproportions of types of leukocyte in the sample from the subject forassessing the disease condition of the subject.

In a related embodiment, detecting the methylation status of the CpGdinucleotide sequence includes: extending each hybridized probe of theresulting hybridized probes on the array by primer extension to obtain aresulting primer extension product; ligating the resulting primerextension product to an oligonucleotide complementary to the DNAsequence of a 3″ region of the gene to obtain a resulting template forPCR on the array; and amplifying by PCR and measuring amount ofresulting PCR product, thereby detecting the methylation status of theCpG dinucleotide containing nucleotide sequence.

In another related embodiment amplifying by PCR further includes:amplifying the resulting template on the array using primers pairsincluding a 5′ primer specific to each of the methylated or theunmethylated form of the CpG dinucleotide containing gene, and a 3′primer specific to the gene containing the CpG dinucleotide, therebyresulting in a first PCR product; amplifying the resulting first PCRproduct with differentially labeled 5′ primers that specifically amplifyeither the methylated or the unmethylated form of the CpG dinucleotidecontaining nucleotide sequence containing gene, and a common 3′ primer,resulting in a differentially labeled second PCR product, andhybridizing the second PCR product to the CpG dinucleotide containinggene for measuring amount of the second PCR product, thereby detectingthe methylation status of the CpG dinucleotide sequence.

Detecting the methylation status of the CpG dinucleotide sequence, inanother related embodiment of the method, includes extending theresulting hybridized probes on the array by single base primer extensionwith a labeled nucleotide.

The array used in the method, in a related embodiment, includes at least16 probes, at least 64, at least 96 probes or at least 384 probes. Inanother related embodiment of the method the plurality of probes on thearray hybridizes with a plurality of 118 different nucleotide sequencesoccurring in the plurality of genes. In yet another related embodimentof the method each probe on the array is complementary to nucleotidesequences having SEQ ID NO: 1 to SEQ ID NO: 96.

In various embodiments of the method, at least one probe on the array iscomplementary to a nucleotide sequence described herein, for example thenucleotide sequence corresponds to a gene or locus described herein. Invarious embodiments, the gene or the locus is found herein in anexample, a figure, or a table. In various embodiments, the gene or locusis selected from the group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3,SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ IDNO:9, SEQ ID NO:10, SEQ Ill NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ IDNO: 14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ IDNO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ IDNO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ IDNO:29. SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ IDNO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ IDNO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ IDNO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ IDNO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ IDNO: 54, SEQ ID NO:55, SEQ ID NO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ IDNO:59, SEQ ID NO:60, SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ IDNO:64, SEQ ID NO:65, SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ IDNO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ IDNO: 74, SEQ ID NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ IDNO:79, SEQ ID NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ IDNO:84, SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ IDNO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ IDNO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119, SEQ ID NO:120, SEQ IDNO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID NO:124, SEQ ID NO:125, SEQID NO:126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID NO:129, SEQ ID NO:130,SEQ ID NO:131, SEQ ID NO:132, SEQ ID NO:133, SEQ ID NO:134, SEQ IDNO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQ ID NO:138, SEQ ID NO:139, andSEQ ID NO:140.

In various embodiments of the method, the disease condition assessed isselected from: an autoimmune disease, an allergy, a transplantrejection, obesity, an inherited disease, and a cancer. Assessing thedisease condition using the array, in related embodiments of the method,includes at least one of: monitoring, diagnosing, prognosing, andmeasuring response to therapy by comparing estimated proportions oftypes of leukocytes of the subject after therapy to proportions ofleukocytes from a healthy subject.

In a related embodiment of the method the sample containing the genomicDNA used to hybridize with the probes on the array is fresh i.e.,obtained in real time prior to performing the method. In another relatedembodiment of the method the sample is archival.

In various embodiments of the method for estimating proportions ofleukocytes using the array, the leukocyte types include at least oneselected from: CD19+ B lymphocytes, CD15+ granulocytes, CD14+ monocytes,CD56+ natural Killer cells, and CD3+ T lymphocytes.

Another related embodiment provides a kit for estimating proportions ofleukocyte types in a sample by analyzing differential methylation of CpGdinucleotides in a plurality of genes of the sample, the kit including:an array having: a plurality of DNA probes attached to a plurality ofsurfaces at known addressable locations on the array, such that thesurface at each location is attached to a DNA probe having a specificnucleotide sequence, such that the DNA probe having the specificnucleotide sequence hybridizes to a DNA sequence of a methylated form oran ummethylated form of a CpG dinucleotide in a sequence of a gene ofthe plurality of genes in the sample, such that the array is selectedfrom having: at least 16 probes, at least 64 probes, at least 96 probes,and at least 384 probes; primers and reagents for detecting thehybridized probes and for detecting the reaction products derived fromthe hybridized probes; and instructions for using the array with abisulfite reagent, thereby providing an estimation of proportions ofleukocyte types in the sample.

In a related embodiment of the kit, the probes hybridize with arespective plurality of 118 different DNA sequences occurring in theplurality of genes. In yet another related embodiment of the kit theprobes have nucleotide sequences complementary to 96 nucleotidesequences having SEQ ID NO: 1 to SEQ ID NO: 96.

In various embodiments of the kit, at least one probe is complementaryto a nucleotide sequence described herein, for example at least onenucleotide sequence corresponds to a gene or locus described herein. Forexample, the gene or locus is shown or listed in an example, a figure,or a table herein. In various embodiments, the gene or locus is at leastone selected from the group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3,SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ IDNO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ IDNO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ IDNO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ IDNO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ IDNO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ IDNO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ IDNO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ IDNO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ IDNO: 54, SEQ ID NO:55, SEQ ID NO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ IDNO:59, SEQ ID NO:60, SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ IDNO:64, SEQ ID NO:65, SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ IDNO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ IDNO: 74, SEQ ID NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ IDNO:79, SEQ ID NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ IDNO:84, SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ IDNO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ IDNO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119, SEQ ID NO:120, SEQ IDNO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID NO:124, SEQ ID NO:125, SEQID NO:126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID NO:129, SEQ ID NO:130,SEQ ID NO:131, SEQ ID NO:132, SEQ ID NO:133, SEQ ID NO:134, SEQ IDNO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQ ID NO:138, SEQ ID NO:139, andSEQ ID NO:140.

The instructions in a related embodiment of the kit include methods for:reacting genomic DNA in the sample with the bisulfite reagent to convertunmethylated cytosine residues to uracil; hybridizing resultingbisulfite treated genomic DNA with probes immobilized to the surfaces toobtain resulting hybridized probes on the array, such that the DNAprobes hybridize to a DNA sequence of each of a methylated form and anummethylated form of a CpG dinucleotide sequence in a gene of theplurality of genes; and detecting the methylation status of the CpGdinucleotide sequence, thereby estimating proportions of leukocyte typesin the sample from the subject for assessing the disease condition ofthe subject.

In a related embodiment of the kit the instructions for detecting themethylation status of the CpG dinucleotide sequence include methods for:extending each hybridized probe of the resulting hybridized probes onthe array by primer extension to obtain a resulting primer extensionproduct; ligating the resulting primer extension product to anoligonucleotide complementary to the DNA sequence of a 3′ region of thegene to obtain a resulting template for PCR on the array; and amplifyingby PCR and measuring amount of resulting PCR product, thereby detectingthe methylation status of the CpG dinucleotide sequence.

In another related embodiment of the instructions for kit amplifying byPCR include methods for: amplifying the resulting template on the arrayusing primers pairs having a 5′ primer specific to each of themethylated or the unmethylated form of the CpG dinucleotide containinggene, and a 3′ primer specific to the gene containing the CpGdinucleotide, thereby resulting in a first PCR product; amplifying theresulting first PCR product with differentially labeled 5′ primers thatspecifically amplify each of the methylated and unmethylated form of theCpG dinucleotide sequence containing gene, and a common 3′ primer,resulting in a differentially labeled second PCR product, andhybridizing the second PCR product to the CpG dinucleotide containinggene for measuring amount of the second PCR product, to detect themethylation status of the CpG dinucleotide sequence.

Instructions for detecting the methylation status of the CpGdinucleotide sequence, in another related embodiment of the kit, includemethods for extending the resulting hybridized probes on the array bysingle base primer extension with a labeled nucleotide.

Another embodiment of the invention is a method of treating a subjectfor a disease condition, such that the subject is a human patient and,such that the disease condition is a cancer, the method comprising:obtaining signatures comprising differentially methylated regions (DMRs)from types of leukocytes in a blood sample of the patient, the types ofleukocytes comprising at least one selected from: CD19+ B lymphocyte,CD15+ granulocyte, CD14+ monocyte, CD56^(dim) Natural Killer cell,CD56^(bright) Natural Killer cell, and CD3+ T lymphocyte, and from ahealthy control human subject not having the cancer; comparing asignature specific for the type of leukocyte in the patient with that inthe healthy subject, such that the type of leukocyte specific signatureis an indication of amount of cells of the type of leukocyte circulatingin blood, and such that a decreased amount of the cells of the type ofleukocyte circulating in the blood of the patient compared to thehealthy subject is an indicium of the cancer; and, administering acomposition comprising the cells of the type of leukocyte to thepatient, thereby increasing the amount of the cells of the type ofleukocyte in the patient and treating the cancer.

In various embodiments of the method the leukocyte type cell is theCD56^(dim) Natural Killer cell.

The cancer in related embodiments of the method is head and necksquamous cell carcinoma (HNSCC). In embodiments of the method the DMRsignature specific for CD56^(dim) Natural Killer cells includes at leastone CpG dinucleotide in a region near the promoter of gene NKp46. Inother embodiments of the method the DMR signature specific forCD56^(dim) Natural Killer cells is a CpG dinucleotide in a region nearthe promoter of the gene NKp46, such the methylation status of the CpGdinucleotide is quantified by methylation specific quantitativepolymerase chain reaction (MS-qPCR) using primers and probes having SEQID NOs: 116-118 and 97-99. According to other embodiments of the method,the DMR signature specific for CD56^(dim) Natural Killer cells is a CpGdinucleotide in a region near the promoter of the gene NKp46, such thatthe methylation status of the CpG dinucleotide is quantified by digitalPCR involving emulsion and nanofluidic partitioning using primers andprobes having SEQ ID NOs: 116-118 and 97-99.

In related embodiments of the method the blood sample is archival.Alternatively the blood sample is fresh.

In various embodiments of the method, the signature comprises at leastone gene or locus described or shown in examples herein, for example SEQID NO: 1-96 and 119-140. In various embodiments of the method, the atleast one gene or locus is selected from the group consisting of: FGD2,HLA-DOB, BLK, IGSF6, CLDN15, SFT2D3, ZNF22, CEL, HDC, GSG1, ECM1,OSBPL5, LDB2, NCR1, EPS8L3, CD3D, PPP6C, CD3G, TXK, and FAIM. In variousembodiments of the method, the at least one gene or locus is selectedfrom the group consisting of: CLEC9A (2 loci), INPP5D, INHBE, UNQ473,SLC7A11, ZNF22, XYLB, HDC, RGR, SLCO2B1, C1orf54, TM4SF19, IGSF6,KRTHA6, CCL21, SLC11A1, FGD2, TCL1A, MGMT, CD19, LILRB4, VPREB3,FLJ10379, HLA-DOB, EPS8L3, SHANK1, CD3D (2 loci), CHRNA3, CD3G (2 loci),RARA, and GRASP.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a photograph of a clustering heatmap for External ValidationWhite Blood Cell Data (S₀). The data were obtained by applying themeasurement error formulation described in Examples 1-3. The methoddelineates effects resulting from immune cell distribution as comparedto those resulting from other “non-cell type” alterations in DNAmethylation. Methylation array procedure was carried out using InfiniumHumanMethylation27 Beadchip Microarrays from Illumina, Inc. (San Diego,Calif.). The White Blood Cell data were gathered from a set of 46samples of purified white blood leukocyte subtypes obtainedcommercially. Light=unmethylated (Y_(hj)=0), black=partially methylated(Y_(hj)=0:5), dark=methylated (Y_(hj)=1).

FIG. 2 is a chart of the results of cell mixture reconstructionexperiments validating prediction of individual sample profiles. Thereconstruction experiments involved six known mixtures of monocytes andB cells and six known mixtures of granulocytes and T cells. Knownfractions (Expected) and resulting predictions from Infinium 27Kprofiles (Observed) percentages of each cell type are shown by shade(dark=100, white=0).

FIG. 3 is a photograph of a clustering heatmap for Target HNSCC data(S₁). The target data set S₁ consisted of arrays applied to whole bloodspecimens collected in a random subset of individuals involved in anongoing population-based case-control study (Peters et al., 2005) ofhead and neck cancer (HNSCC): 92 cases and 92 age and sex matchedcontrols. Blood was drawn at enrollment (prior to treatment in 85% ofthe cases). Yellow or light areas represent unmethylated (Y_(hj)=0),black areas represent partially methylated (Y_(hj)=0:5), gray areasrepresent methylated (Y_(hj)=1). The annotation track above the heatmapindicates case-control status.

FIG. 4 is a graphical representation of bias sensitivity analysis forHNSCC Data. Bias was assessed by resampling the case coefficients of B₁,a procedure that assumes maximum bias. The abscissa shows the number ofassumed non-zero alterations. The knob-shaped central portions of eachthick vertical lines (red) indicate median value, the thick verticallines (blue) indicate interquartile range, the thin lines (blue)represent 95% probability ranges, and the upper dots (black) represent99% probability ranges.

FIG. 5A and FIG. 5B are graphs of Rate-of-Convergence of the Hessianmatrix H_(m) which allows the determination of the optimal number of CpGsites whose combined methylation status measurements most accuratelyreflect the exact distribution of different cells in a mixture. Thex-axis represents increasing m, the number of CpG sites (ordered byF-statistic) included in the model space, on a logarithmic scale.

FIG. 5A shows convergence by correlating the Hessian Matrix with thenumber of CpG sites included in the measurement. The dotted line showsthe tangent at low values of m.

FIG. 5B shows the Rate of convergence which was calculated by smoothingthe first differences of log₁₀(trH_(m)). The dotted line (red) in (B)corresponds to linear convergence.

FIG. 6 is a photograph of a clustering heatmap for Target Ovarian Cancerdata (S₁) (Teschendorff et al., 2009, PLoS ONE 4, e8274). Only thosecases were included in which blood was collected pre-treatment. Afterremoving four arrays with a preponderance of missing values, the dataset consisted of 272 controls and 129 cases having blood drawn prior totreatment. Light=unmethylated (Y_(hj)=0), black=partially methylated(Y_(hj)=0:5), dark=methylated (Y_(hj)=1). The annotation track above theheatmap indicates case-control status (cancer case or control).

FIG. 7 is a photograph of a clustering heatmap for Target Down SyndromeData. The method herein was applied to a trisomy 21 (Down syndrome) dataset (Kerkel et al., PLoS Genet. 2010, 6(11):e1001212) consisting of 29total peripheral blood leukocyte samples from Down syndrome cases and 21controls, as well as six T cell samples from cases and four T cellsamples from controls (GEO Accession number GSE25395).Light=unmethylated (Y_(hj)=0), black=partially methylated (Y_(hj)=0:5),dark=methylated (Y_(hj)=1). The annotation track above the heatmapindicates case-control and cell type status [Down syndrome case (wholeblood), control (whole blood), T cell (pooled cases and controls)].

FIG. 8 is a photograph of a clustering heatmap for Target Obesity Dataobtained from applying the methods herein to an obesity data set (Wanget al., BMC Med 2010, 8:87) having 7 lean African-Americans and 7 ObeseAfrican-Americans (GEO Accession number GSE25301). Light areas representunmethylated (Y_(hj)=0), black areas represent partially methylated(Y_(hj)=0:5), grey areas represent methylated (Y_(hj)=1). The annotationtrack above the heatmap indicates case-control status (obese and lean).

FIG. 9 is a photograph of the methylation profiles of white blood cellsobtained from a DNA methylation array analysis described in Example 9.Methylation array assay was performed using Infinium HumanMethylation27Beadchip Microarrays obtained from Illumina, Inc. (San Diego, Calif.).The number of individual leukocyte samples in each methylation class isshown in the table to the right. The DNA methylation profiledistinguishes Lymphocytes from Myeloid Derived Leukocytes. The 5000 mostvariable CpG loci are plotted on the left. Less methylated loci arerepresented as grey areas and more methylated loci are represented asblack areas. A partitioned mixture model (RPMM) of autosomal geneInfinium beta values from sorted human peripheral blood leukocytes wasperformed using an R version 2.11.1 of Illumina's software whichprovides convenient mechanisms for loading and analyzing of the resultsof methylation status, and quality control and basic visualizationtasks.

FIG. 10A and FIG. 10B are graphical representations of the DNAmethylation status of regions in CD3E and CD3Z genes.

FIG. 10 A shows DNA methylation status of a region in CD3E that wasidentified from the DNA methylation array analysis (the results of whichare shown in FIG. 9) as one of the two candidate DMRs with specificitytowards CD3+ T cells. The DNA methylation status was measured bypyrosequencing bisulfite converted DNA from different sorted, human,peripheral blood leukocytes.

FIG. 10 B shows DNA methylation status of a region in CD3Z gene that wasidentified from the DNA methylation array analysis (the results of whichare shown in FIG. 9) as one of the two candidate DMRs with specificitytowards CD3+ T cells. The DNA methylation status of the region in CD3Zgene in different sorted, human, peripheral blood leukocytes wasmeasured by MethyLight® qPCR.

FIG. 11 is a drawing of the genomic region containing CD3Z gene, basedon information available from the public databases UniProt, RefSeq andGenBank. UniProt is a freely accessible universal protein resource ofprotein sequence and functional information. RefSeq is a collection thatprovides integrated and annotated set of sequences including genomicDNA, transcripts and protein. GenBank® is the genetic sequence databaseof the National Institutes of Health which contains an annotatedcollection of publicly available DNA sequences.

FIG. 12 is a list of genomic regions used for measuring methylation ofCD3Z and FOXP3 gene, for quantitating genome copy numbers, and a list ofthe corresponding primer and probe sequences. Underlined letters are “C”in CpG motifs.

FIG. 13 A, FIG. 13B and FIG. 13C are graphical representations ofstandard calibration curves which show the relationship between copynumbers of genomic DNA and the signal obtained from quantitative realtime methylation specific PCR. The calibration curves are used forquantifying CD3+ T cells, Tregs (FOXP3 demethylated) and ratios ofTregs/CD3+ T cells. DNA isolated from purified cell types was bisulfiteconverted and serially diluted into a background of fully methylatedcommercial DNA standard (Qiagen). The total genomic copy numbers of eachsample within a dilution series remained constant. Log dilutions wereperformed in the appropriate range of Ct values corresponding to testsamples (whole blood, tumor specimens). Using cytosine-less: C-lessprimers genome copy numbers for each test standard were measured toensure adequate input DNA and to normalize the CD3+ and Treg assayvalues.

FIG. 13A shows the calibration curve for C-less total input. (N=eightreplicates); errors denote standard error of the mean Ct value.

FIG. 13B shows dilution of isolated normal PanT cells (N=sevenreplicates).

FIG. 13C shows dilution and calibration curve for isolated CD3+CD25+ Tcells (N=8 eight replicates). Calibration curves (FIG. 13A-C) were usedto estimate total input copies, CD3+ T cell and Tregs copies,respectively.

FIG. 14A-D are a drawing and a set of graphical representations showingdetection of CD3+ T cell numbers by measuring differential demethylationusing MS-qPCR.

FIG. 14A is a schematic diagram showing methylation specific primers andprobe targeting six CpGs (lollipops) in a region of the CD3Z geneidentified herein as demethylated in CD3+ T cells.

FIG. 14B shows results of real time PCR. The real time PCR Ct valuesdecreased linearly with a ten-fold increase in bisulfite converted CD3+T cell DNA concentration. Bisulfite converted universal methylated DNAwas used to keep total amount of DNA in samples constant. At least fivereplicates of each sample were plotted.

FIG. 14C shows correlation between T cell levels determined by flowcytometry and CD3Z MS-qPCR. Evaluation of CD3+ T cell level by flowcytometry was observed to be highly correlated with T cellquantification by CD3Z MS-qPCR in whole blood specimens from gliomapatients and healthy donors.

FIG. 14D shows correlation between T cell counts obtained using byimmunohistochemical staining and CD3Z MS-qPCR. CD3+ T cell count byimmunohistochemical staining correlates with T cell quantification byCD3Z MS-qPCR in excised tumors across histological subtypes. Pearsoncorrelations and F-test p-values are shown in FIG. 14B-D.

FIG. 15 A, FIG. 15B and FIG. 15C (FIG. 15A-C) are graphicalrepresentations showing T cells and Tregs in the peripheral blood ofglioblastoma multiform (GBM) patients and healthy donors determined byMS-qPCR for demethylation of specific CpG loci.

FIG. 15A shows comparison of T cell numbers in blood between GBMpatients and control subjects measured using CD3Z demethylation assay.

FIG. 15B shows comparison of Tregs between GBM patients and controlsubjects measured using FOXP3 demethylation assay.

FIG. 15C is a graph showing comparison of Treg percent of T cellsbetween GBM patients and control subjects determined by the ratio ofFOXP3/CD3Z demethylation. Wilcoxon rank sum p-values are shown.

FIG. 16 A, FIG. 16B and FIG. 16 C (FIG. 16A-C) are graphicalrepresentations showing association between cigarette smoking andperipheral blood T cells and Tregs in glioma patients and healthy donorsdetermined by MS-qPCR for demethylation of specific CpG loci.

FIG. 16A shows a comparison of peripheral blood T cell levels,determined by CD3Z demethylation, among never, former and currentcigarette smokers stratified by glioma case status (indicated “cases” onthe abscissa).

FIG. 16B shows a comparison of peripheral blood Treg levels, determinedby FOXP3 demethylation, among never, former and current cigarettesmokers stratified by glioma case status.

FIG. 16C shows a comparison of peripheral blood Treg percent of T cells,determined by ratio of FOXP3 to CD3Z demethylation, among never, formerand current cigarette smokers stratified by glioma case status. Wilcoxonrank sum p-values are shown.

FIG. 17A, FIG. 17B and FIG. 17C (FIG. 17A-C) are graphicalrepresentations showing levels of T cell and Treg infiltrates in excisedglioma tumors determined by MS-qPCR for demethylation of specific CpGloci.

FIG. 17A shows T cell levels, determined by CD3Z demethylation, in solidglioma samples stratified by tumor grade.

FIG. 17B shows Treg levels, determined by FOXP3 demethylation, in solidglioma samples stratified by tumor grade.

FIG. 17C shows Treg percent of T cells, determined by ratio of FOXP3 toCD3Z demethylation, in solid glioma samples stratified by tumor grade.Wilcoxon rank sum p-values are shown.

FIG. 18A, FIG. 18B and FIG. 18C (FIG. 18A-C) are graphicalrepresentations of flow cytometry analysis of CD3+ T cells and totalleukocytes in whole blood from glioma cases and controls.

FIG. 18A shows a forward and side scatter plot of a representative bloodsample showing gating for lymphocytes and counting beads.

FIG. 18B shows lymphocyte subpopulation observed using gating for CD3expression.

FIG. 18C shows CD45 gating on non-bead events. CD45+ low and high cellswere added in order to count total CD45+ cells.

FIG. 19A-C are photographs and a lie graph that show immunohistochemical(1HC) staining of a representative GBM specimen.

FIG. 19A shows CD3 staining. Average number of cells positive forstaining was 418.

FIG. 19 B shows CD8 staining. Average number of cells positive forstaining was 296.

FIG. 19 C shows correlation of CD3 and CD8 staining, Pearson r=0.992

FIG. 20 is a set of two heatmaps showing results of MS-qPCR andbisulfite pyrosequencing of Magnetic activated cell sorting (MACS)sorted human leukocyte subsets. Abbreviations: B=B lymphocytes,Gran=Granulocytes, Neut=Neutrophils, Mono=Monocytes, NK=CD56+ Naturalkiller cells, Nkdim=CD16+CD56dim natural killer cells,NKbr=CD16-CD56bright natural killer cells, NK8+=CD8+CD56+ natural killercells, NK8-=CD8-CD56+ natural killer cells, NKT=CD3+CD56+ natural killerT cells, T=CD3+T lymphocytes, CD8=CD3+CD8+ T lymphocytes (cytotoxic Tcells), CD4=CD3+CD4+ T lymphocytes (helper T cells),Treg=CD3+CD4+CD25+FOXP3+ regulatory T cells.

FIG. 20 A is a heatmap of DNA methylation in FOXP3 and CD3Z gene regionsassessed by MS-qPCR.

FIG. 20 B is a heatmap of DNA methylation at three CpG loci in the CD3Zgene assessed by bisulfite pyrosequencing.

FIG. 21A-C are graphical representations showing levels of T cell andTreg infiltrates in glioma tissues stratified by histological subtypedetermined by MS-qPCR for demethylation of specific CpG loci.Abbreviations: PA=Pilocytic Astrocytoma, EP=Ependymoma,OD=Oligodendroglioma, OA=Oligoastrocytoma, AS=Astrocytoma,GBM=Glioblastoma multiforme. Kruskal-Wallis one-way analysis of varianceby rank test p-values is shown.

FIG. 21A shows T cell levels determined by CD3Z demethylation in solidglioma samples stratified by tumor histology.

FIG. 21B shows Treg levels determined by FOXP3 demethylation in solidglioma samples stratified by tumor histology.

FIG. 21C shows Treg percent of T cells, determined by ratio of FOXP3 toCD3Z demethylation in solid glioma samples stratified by histology.

FIG. 22A-C are graphical representations showing Kaplan Meier analysisof time of survival of glioma patients stratified according to whetherthe level of T cells or Tregs in the tumor infiltrates of the patientsare above or below the median level of T cells or Tregs, respectively.Log Rank p-values shown.

FIG. 22A shows survival (ordinate) of glioma patients as a function oftime (abscissa) in relation to T cell levels as determined by CD3Zdemethylation.

FIG. 22B shows survival of glioma patients in relation to Treg levels asdetermined by FOXP3 demethylation.

FIG. 22C shows survival of glioma patients in relation to Treg percentof T cells as determined by ratio of FOXP3 to CD3Z demethylation.

FIG. 23A-B are representations of results obtained from analysis of DMRsof leukocyte subtypes.

FIG. 23A shows a heat map of the methylation status for the highestranked 50 leukocyte DMRs by leukocyte subtype.

FIG. 23B shows a Plot depicting the −log 10(P-values) for the highestranked 50 leukocyte DMRs across three cancer data sets (HNSCC; Ovarian;Bladder). P-values (ordinate) show methylation differences betweencancer cases and non-cancer controls and were obtained from individualunconditional logistic regression models fit to each of the 50 leukocyteDMRs. For the HNSCC data set, logistic regression models were adjustedfor patient age, gender, smoking status (never, former, current),smoking pack years, weekly alcohol consumption, and HPV serology status.The bladder cancer data set was adjusted for patient age, gender,smoking status, smoking pack years, and family history of bladdercancer. The ovarian cancer data set was adjusted for patient age group(55-60, 60-65, 65-70, 70-75 and >75 years). The horizontal dashed linerepresents −log 10(p=0.05).

FIG. 24A-B show results obtained from the DMR profile analysis of theHNSCC data set determining methylation class membership.

FIG. 24A left column shows a heat map of the HNSCC testing data set.Rows represent subjects, which are grouped by predicted methylationclass membership. Columns represent the highest ranked 50 leukocyte DMRsthat were used to generate the methylation classes for the HNSCC testingset. FIG. 24 A right column is a bar-plot depicting the percent cancercase/control across the predicted methylation classes in the HNSCCtesting set.

FIG. 24B shows receiver operating characteristic (ROC) curves based onthe predicted methylation classes only in the HNSCC testing set andmethylation classes including patient age, gender, smoking status(never, former, current), smoking pack years, weekly alcoholconsumption, and HPV serostatus.

FIG. 25A-B show results obtained from the DMR profile analysis of theOvarian data set for determining methylation class membership.

FIG. 25A is a heat map of the ovarian testing data set. Rows representsubjects which are grouped by predicted methylation class membership.Columns represent the highest ranked ten leukocyte DMRs that were usedto generate the methylation classes for the ovarian testing set. FIG. 25A right column is a bar-plot depicting the percent cancer case/controlacross the predicted methylation classes in the ovarian testing set.

FIG. 25B shows ROC curves based on the predicted methylation classesalone in the ovarian testing set and methylation classes plus patientage group (55-60, 60-65, 65-70, 70-75 and >75 years).

FIG. 26A-B show results obtained from the DMR profile analysis of thebladder data set for determining methylation class membership.

FIG. 26A is a heat map of the bladder testing data set. Rows representsubjects, which are grouped by predicted methylation class membership.Columns represent the highest ranked 56 leukocyte DMRs that were used togenerate the methylation classes for the bladder testing set. FIG. 26 Aright column represents a bar-plot depicting the percent cancercase/control across the predicted methylation classes in the bladdertesting set.

FIG. 26B shows ROC curves based on the predicted methylation classesalone in the bladder testing set and methylation classes plus patientage, gender, smoking status (never, former, current), smoking packyears, and family history of bladder cancer.

FIG. 27A-C are graphical representations showing image plotsrepresenting the pairwise spearman correlation coefficients.

FIG. 27A shows the six CpG loci identified by HNSCC analysis (Langevin SM et al., Epigenetics. 2012 March; 7(3):291-9) and the highest ranked 50leukocyte DMRs used in the present analysis.

FIG. 27B shows the seven CpG loci identified by the alternative ovariananalysis and the highest ranked ten leukocyte DMRs used in the presentanalysis,

and (c) the nine CpG loci identified by the bladder analysis reported in(Laird P W, 2003 Nat Rev Cancer 3:253-266) and the highest ranked 56leukocyte DMRs used in the present analysis.

FIG. 27C shows the nine CpG loci identified by the bladder analysisreported in (Shen L et al., 2007 PLoS genetics 3:2023-2036) and thehighest ranked 56 leukocyte DMRs used in the present analysis.

FIG. 28 is a schematic diagram showing hierarchy of leukocyte subtypesand sample sizes for each of the leukocyte subtypes used in the analysisfor determination of methylation class membership.

FIG. 29 is a diagram representing the analytic workflow the HNSCC dataset (n=184; 92 HNSCC cases and 92 cancer-free controls). The full HNSCCdata set was first divided into equally sized training and testing sets.The training sets were used in development of a classifier based onleukocyte DMRs. The resulting classifiers were then used to predictmethylation class membership for the observations in the respectiveindependent testing sets. The phenotypic importance of the predictedmethylation classes in the testing data was examined subsequently.

FIG. 30 is a diagram representing the analytic workflow the ovariancancer data set (n=401; 128 ovarian cancer cases and 273 cancer-freecontrols). The full ovarian cancer data set was divided into equallysized training and testing sets. The training sets were used in thedevelopment of a classifier based on leukocyte DMRs. The resultingclassifiers were then used to predict methylation class membership forthe observations in the respective independent testing sets. Thephenotypic importance of the predicted methylation classes in thetesting data was then examined.

FIG. 31 is a diagram representing the analytic workflow of the bladdercancer data set (n=460; 23 Bladder cancer cases and 237 cancer-freecontrols). The full bladder cancer data set was divided into equallysized training and testing sets. The training sets were used in thedevelopment of a classifier based on leukocyte DMRs. The resultingclassifiers were then used to predict methylation class membership forthe observations in the respective independent testing sets. Thephenotypic importance of the predicted methylation classes in thetesting data was then examined.

FIG. 32 is a diagram illustrating Semi-Supervised RecursivelyPartitioned Mixture Models (SS-RPMM) for predicting methylation classmembership. The full methylation dataset was randomly divided intotraining and testing sets. Using the training data only, univariatemodels (adjusted for potential confounders) were used to identify CpGloci whose methylation is most strongly associated with the clinicalvariable of interest (i.e., case/control status). RPMM is then fit tothe training data using the M CpGs that are most associated with theclinical variable of interest (M is determined using a nestedcross-validation procedure) CpGs. The resulting solution is then used inconjunction with an empirical Bayes classifier to predict methylationclass membership for the observations in the testing data.

FIG. 33A-D show results obtained from SS-RPMM analysis (see FIG. 30) ofthe ovarian cancer data set for determination of methylation classmembership.

FIG. 33A is a heatmap of the testing set obtained by predictedmethylation class using the SS-RPMM procedure. Rows represent subjectsand columns represent the seven CpG loci identified by this analysis.

FIG. 33B represents percentage of cases/controls obtained by predictedmethylation class membership in the testing set.

FIG. 33C sows information regarding the seven CpG loci identified by theSS-RPMM analysis.

FIG. 33D shows a ROC/AUC (area under the curve) analysis based on thepredicted methylation class memberships in the testing set. Darkrepresents the ROC/AUC based on the predicted methylation classes alongand light represents the ROC/AUC using the predicted methylation classesand patient age group.

FIG. 34 is a graphical representation showing loci in the gene NKp46chosen from candidate NK cell-specific differential DNA methylationmarkers, selected by DNA methylation and mRNA expression criteria.

Linear mixed effects modeling of DNA methylation microarray data fromMACS isolated human leukocytes generated a coefficient estimatingdifferential methylation in NK cells relative to other cell subtypes,shown on the avscissa. Linear modeling of mRNA microarray data from thesame isolated cells determined log-fold change in expression between NKcells and each of the following subtypes: T cells, B cells, granulocytesand monocytes. The average of these four log-fold change values is shownon the ordinate. Significance for a particular gene region was achievedwhen q<0.1 for four mRNA expression linear models as well as the DNAmethylation mixed effects model. Candidates for NK cell-specific DNAmethylation biomarkers were limited to significant gene loci exhibitingdecreased methylation in NK cells (methylation estimate<0) and withingenes that exhibited increased RNA expression (log fold change>1). Thecandidate loci are marked with asterisks in the top left quadrant, andNKp46 loci are marked with grey asterisks.

FIG. 35 is a heatmap showing demethylation status of NKp46 determined bymethylation specific quantitative PCR (MS-qPCR) of isolated humanleukocyte populations. Individual samples of (MACS) purified white bloodcell subtypes were subjected to a MS-qPCR assay that detectsdemethylated copies of NKp46 DNA. Extent of NKp46 methylation isillustrated in this heatmap in which light indicates that copies of DNAin particular sample were demethylated in the targeted region of NKp46,and dark indicates that copies were methylated.

FIG. 36 is a line graph showing linearity of NKp46 MS-qPCR calibration.Bisulfite converted universal methylated DNA was used to standardizetotal amount of DNA in samples at a constant amount. At least threereplicates of each standard are plotted. Real time PCR Ct valuesdecrease linearly with ten-fold increase in bisulfite converted NK cellDNA concentration.

FIG. 37 is a bar graph showing prevalence of HNSCC by normal NKp46demethylation tertile. Normal NKp46 demethylation tertile cutoffs weredetermined from control blood samples only. Higher tertiles indicatehigher NK cell levels. HNSCC prevalence (ordinate) refers to the percentof total cases in this example whose NKp46 demethylation measurementsfell within the control derived tertile range. Displayed p-value is froma chi-squared test for trend in proportions.

FIG. 38 is a heatmap showing methylation status of selected NKp46 CpGloci measured by bisulfite pyrosequencing of isolated human leukocytes.The methylation status of eight individual CpG loci near the promoterregion of NKp46 were interrogated by pyrosequencing of bisulfiteconverted DNA extracted from Magnetic activated cell sorting (MACS)isolated human leukocyte populations. CpG numbers 2 through 7 representthe six loci targeted in the MS-qPCR assay. This heatmap displaysmethylation levels at each locus ranging from unmethylated (light) tomethylated (dark).

FIG. 39 is a graph showing percent demethylation (ordinate) of a DNAregion in NKp46 in control and HNSCC patient blood samples (abscissa)assessed by MS-qPCR. The NKp46 MS-qPCR assay measures the extent of DNAdemethylation. A higher level of demethylation indicates a higher levelof NK cells within a sample. Wilcoxon rank sum p-value is displayed.

FIG. 40 is a listing of DNA sequences of regions in 96 different genes,each sequence having one CpG dinucleotide shown within square bracketsand used to determine methylation status of the gene. The DNA sequencesurrounding the CpG dinucleotides was used to design probes for thearray and for primers for performing the methods for analyzingdifferential methylation. Also included are the names of the genes,chromosome number indicating the chromosome in which genes are located,the source of the DNA sequences, Genebank accession numbers, and thecoordinate of the CpG dinucleotide in respective genes.

FIG. 41A-B are schematic diagrams showing different ways of representingeffects on measured DNA methylation due to an exposure or a specificphenotype.

FIG. 41A depicts the marginal effects (β) on measured DNA methylation.The marginal effects are effects which are not adjusted for white bloodcell (WBC) distribution.

FIG. 41B depicts the effects on measured DNA methylation adjusted forWBC distribution resulting from exposure or a specific phenotype.

FIG. 42 is a set of graphical representations showing the relationshipbetween {circumflex over (α)} and {circumflex over (β)}, the effect onmeasured DNA methylation not adjusted or adjusted for WBC distribution,for the covariate (e.g. age, current smoker status, toe Arsenicconcentration and Dye use) of interest over autosomal CpGs. Dotsrepresents overall methylation as indicated by the first component ofthe coefficient vector {circumflex over (β)}, corresponding to theintercept (Example 38), light=low, black=moderate, dark=high. Thediagonal straight line represents identity ({circumflex over(α)}={circumflex over (β)}). The curve depicts a loess fit to thescatter plot.

FIG. 43A-B are a graphical representation showing fluorescenceintensities of CD3Z gene amplified by digital droplet PCR, and agraphical representation showing concentration of CD3Z gene in PCRsamples.

FIG. 43A shows a fluorescence intensity dot plot for amplification ofCD3Z gene by detection of intensities of 6 FAM (6-Carboxyfluorescein).Positive and negative droplets are distinguished by a horizontal line.

FIG. 43B shows a correlation of the concentration of copy numbers ofCD3Z gene obtained by measuring 6 FAM fluorescence intensities and theexpected copy numbers of CD3Z gene obtained by dilution of a knownamount of DNA from CD3+ T cells.

FIG. 44A-B are a graphical representation showing fluorescenceintensities of FoxP3 gene amplified by digital droplet PCR, and agraphical representation showing concentration of FoxP3 gene in PCRsamples.

FIG. 44A shows a fluorescence intensity dot plot for amplification ofFoxP3 gene by detection of intensities of 6 FAM (6-Carboxyfluorescein).Positive and negative droplets are distinguished by a horizontal line.

FIG. 44B shows a correlation of the concentration of copy numbers ofFoxP3 gene obtained by measuring 6 FAM fluorescence intensities and theexpected copy numbers of FoxP3 gene obtained by dilution of a knownamount of DNA from CD3+ T cells.

FIG. 45A-B are a graphical representation showing fluorescenceintensities of NKp46 gene amplified by digital droplet PCR, and a tableshowing concentration of NKp46 gene in the PCR samples amplified underdifferent conditions.

FIG. 45A shows a fluorescence intensity dot plot for amplification ofNKp46 gene under different conditions by detection of intensities of 6FAM (6-Carboxyfluorescein). Positive and negative droplets aredistinguished by a horizontal line.

FIG. 45B is a table showing concentration of NKp46 gene in copies/μldetermined under different PCR conditions as fractions of methylatedcontrol DNA.

FIG. 46A-B are a graphical representation showing fluorescenceintensities of NKp46 gene amplified by digital droplet PCR, and a tableshowing concentration of NKp46 gene in the PCR samples amplified underdifferent conditions.

FIG. 46A shows a fluorescence intensity dot plot for amplification ofNKp46 gene by detection of intensities of 6 FAM (6-Carboxyfluorescein).The amplification of demethylated NKp46 locus was performed using C-lessand NKp46 DMR specific primers and probes, and results compared.Positive and negative droplets are distinguished by a horizontal line.

FIG. 46B is a table showing concentration of NKp46 gene in copies/μldetermined with whole blood DNA, Neutrophil DNA, CD16+CD56^(dim) NK cellDNA and CD16+CD56^(bright) NK cell DNA.

FIG. 47 is a drawing of processing and workflow of 85 venous whole bloodsamples analyzed in Examples herein. Eighty five venous whole bloodsamples were collected from disease free human donors. Of these samples79 samples were used for isolation of target cell type by magneticactivated cell separation (MACS) and six samples were subjected toconventional immune profiling in which fresh aliquots were analyzed byprotein based methods. Purity was confirmed by fluorescence activatedcell sorting (FACS) in 79 samples isolated by MACS. The six samplesanalyzed by conventional immune profiling were placed in 12 specificdifferent storage conditions that differ by presence of coagulants,temperature, and/or duration.

DNA was extracted from each of the 79 samples analyzed by FACS and the72 samples in the 12 specific storage conditions. Aliquots of thegenomic DNA from five of the FACS purified, DNA extracted 79 sampleswere combined in quantities that mimicked human blood as determined byartificially reconstituting peripheral blood. Aliquots of each of sevenof the cell DNA mixtures, the FACS purified DNA extracted 79 samples,and the 72 samples stored according to the 12 specific storageconditions were randomized. Aliquots of each of the resulting 158samples were contacted with sodium bisulfate, for analysis ofmethylation status of cytosines in DNA. Aliquots of 58 of these sampleswere analyzed using a high-density methylation microarray (HDMA) andaliquots of 158 samples were analyzed using a low-density methylationmicroarray (LDMA).

FIG. 48A-P are a set of graphs of representative FACS results forpurified WBC subsets used in examples herein. The lower right quadrantof each panel indicates sample purity. The upper right quadrant of eachpanel indicates the viability of the cells in the sample.

FIG. 49 is a diagram representing MACS purified WBC subset samples usedto establish reference libraries of DNA methylation signatures. Terminalnodes represent the final sample cell types, which were each purifiedfrom a specimen of disease-free human blood. The tree diagram indicatesthe hierarchical relationship of sample cell lineages. Pan* samples werenot subsequently selected in the MACS separation process, and thereforecontained a biological mixture of subsets within the cell typeimmediately above them in the tree.

FIG. 50 is a photograph of a clustering heatmap for WBC lineage-specificDNA methylation. DNA methylation signatures distinguishing normal humanleukocyte subtypes were obtained using a high-density DNA methylationmicroarray. Purified WBC subset samples are displayed in FIG. 50 incolumns with cell type indicated at the bottom on the x-axis. IndividualCpG loci are displayed in rows with the gene containing each locusindicated to the right on the y-axis. Methylation values from completelyunmethylated (represented by gray areas) to completely methylated(represented by dark areas) are indicated in the key at the bottom left.Samples and loci were organized according to unsupervised hierarchicalclustering.

FIG. 51 is a photograph of DNA methylation signatures distinguishingnormal human leukocyte subtypes that was obtained using custom,low-density DNA methylation microarray. Purified WBC subset samples aredisplayed in FIG. 51 in columns with cell type indicated at the bottomon the x-axis. Individual CpG loci are displayed in rows with the genecontaining each locus indicated to the right on the y-axis. Methylationvalues from completely unmethylated (represented by gray areas) tocompletely methylated (represented by dark areas) are indicated in thekey at the bottom left. Samples and loci were organized according tounsupervised, hierarchical clustering.

FIG. 52 is a photograph of a crosscheck of purified WBC subset samplesthat was obtained using on a high densityDNA methylation microarray. Thequantity of each of seven WBC subsets (displayed on the abscissa) waspredicted in the purified WBC subset samples using DNA methylation. Thetrue identity of each purified WBC subset sample is shown on theordinate, as indicated to the right. Saturation of the interior binsindicate the estimated proportions of WBC subsets, determined using DNAmethylation, in purified WBC subset samples, as shown in the key at thebottom right.

FIG. 53 is a photograph of a crosscheck of purified WBC subset samplesthat was obtained using a custom, low-density DNA methylationmicroarray. The quantity of each of seven WBC subsets (displayed on theabscissa) was predicted in the purified WBC subset samples using DNAmethylation. The true identity of each purified WBC subset sample isshown on the ordinate, as indicated to the right. Saturation of theinterior bins indicate the estimated proportions of WBC subsets,determined using DNA methylation, in purified WBC subset samples, asshown in the key at the bottom right.

FIG. 54A-D are graphs showing quantitative reconstructions of leukocytesubsets that were obtained using a high density DNA methylationmicroarray. In FIG. 54A-D, the abscissa displays quantities of specificWBC subsets determined using DNA methylation. Cell type is indicated bycolor (light and dark grays) and sample type is indicated by shapeslisted in the insets. Lines are from the origin having a slope of oneindicating ideal correspondence between the displayed values in eachpanel. FIG. 54A contains data for DNA from purified WBC subsets thatwere combined in quantities mimicking human blood under clinicalconditions. The expected quantity of each cell type is plotted on theordinate. Whole blood samples from disease-free human donors weresubjected to WBC subset quantification by the described methods. Thegranulocytes were observed to be the highest percentage of theleukocytes (50-60%) compared to B-cells, T cells, NK cells and monocytes(less than about 40%). FIG. 54B-D are graphs of data for whole bloodsamples from disease-free human donors subjected to WBC subsetquantification by established methods: manual 5-part differential (FIG.54A); automated 5-part differential (FIG. 54B); and FACS (FIG. 54D). Itwas observed that the five WBC quantitations measured using DNAmethylation were very close to the values expected by other methods. InFIG. 54B-D, the neutrophils had the highest percentage of leukocytes(50-60%) compared to cell types lymphocytes, monocytes, and B cells. Themethods herein detected specific, clinically relevant modulations inperipheral blood immune cell composition.

FIG. 55A-D are a set of graphs of quantitative reconstruction ofleukocyte subsets using a custom, low density DNA methylationmicroarray. The abscissa indicates the quantities of specific WBCsubsets determined using DNA methylation. Cell type is indicated byshading and sample type is indicated by shape of the datum point, asdescribed in the inset legends. Lines are drawn from the origin with aslope of one indicating ideal correspondence between the displayedvalues in each panel. The expected quantity of each cell type isindicated by the ordinate. FIG. 55A is a graph of DNA from purified WBCsubsets that were combined in quantities mimicking human blood under 19clinical conditions. In FIG. 55 A the granulocytes contained the highestpercentage of leukocytes (50-60%) compared to B-cells, T cells, NK cellsand monocytes (less than about 20%). FIG. 55B-D are graphs of data forwhole blood samples from disease-free human donors subjected to WBCsubset quantification by the following methods: manual 5-partdifferential (FIG. 55A); automated 5-part differential (FIG. 55B); andFACS (FIG. 55D). In FIG. 54B-D, the neutrophils were observed to havethe highest percentage of leukocytes (about 60%) compared to other celltypes including lymphocytes, monocytes, eosinophils, basophils, T cells,NK cells, and B cells.

FIG. 56A-C are a set of graphs of comparisons of conventional immunecell quantification methods. Cell type is indicated by shading anddisease-free human blood donor is indicated by shape of the point, asdescribed in the legends to the right. Lines are drawn from the origin.A slope of one indicates ideal correspondence between the displayedvalues in each panel. The following methods were compared: manual 5-partdifferential and CBC with automated 5-part differential (FIG. 56A);manual 5-part differential and FACS (FIG. 56B); and CBC with automated5-part differential and FACS (FIG. 56C).

FIG. 57A-F are a set of graphs showing Bland-Altman agreement of immunecell quantification methods/assays applied to whole blood samples fromdisease free human donors. Each data point corresponds to one WBC subsetin one blood sample. The mean WBC subset quantity (percent) determinedby the two given methods is indicated by the abscissa and the differencebetween the WBC subset quantities (percent) determined by the two givenmethods is indicated by the ordinate. The root-mean-square-error (RMSE)value between the two given methods is shown at the top left, in unitsof WBC subset quantity (percent). The data in FIG. 57A show agreementbetween measurements obtained from the Low Density MethylationMicroarray (LDMA) DNA methylation and known amounts of each of the celltypes in laboratory constructed DNA mixtures. FIG. 57B-D contain datathat indicate agreement between immune cell quantification using DNAmethylation (DNAm) from the custom, low-density DNA methylationmicroarray and either: manual 5-part differential (FIG. 57B); CBC withautomated 5-part differential (FIG. 57C), and FACS (FIG. 57D). FIG.57E-G contain data that indicate agreement among the following immunecell quantification methods: CBC with automated 5-part differential andFACS (FIG. 57E); manual 5-part differential and FACS (FIG. 57F); andmanual 5-part differential and CBC with automated 5-part differential(FIG. 57G).

FIG. 58 is a diagram showing details of workflow followed in methodsherein for whole blood samples from disease-free human donors. Thesamples were subjected to following methods of WBC subset quantificationto compare to quantitative reconstruction of WBC subsets using DNAmethylation by the methods herein. Venous whole blood was collected froma disease free human donor and aliquots of the sample were contactedwith heparin, citrate, or EDTA. Each of the heparin, citrate, or EDTAsamples was maintained either as a fresh sample or as a sample storedovernight at room temperature, 4° C., or at −80° C. The heparin freshsample was analyzed for WBC subsets by using flow cytometry, manualdifferential WBC counting, automated differential WBC counting, a highdensity methylation microarray (HDMA), or a low-density methylationmicroarray (LDMA). The other samples including the citrate and EDTAfresh samples or as samples stored overnight at one of room temperature,4° C., or −80° C., and the heparin samples stored overnight at roomtemperature, 4° C., or −80° C. were each analyzed for WBC subsets usingthe HDMA and LDMA.

FIG. 59A-D are a set of graphs showing comparisons of immune cellquantification by DNA methylation for samples treated with differentblood anticoagulants and storage conditions. Blood samples were fromdisease-free human donors. Lines are drawn from the origin with a slopeof one indicating ideal correspondence between the displayed values ineach panel. Cell type is indicated by shading and shape of the datumpoint. FIG. 59A shows data for DNA methylation for blood samples treatedwith citrate (open circle) or EDTA (open square) as an anti-coagulant.FIG. 59B-D show data for DNA methylation for blood samples treated with:heparin (FIG. 59B); EDTA (FIG. 59B); or citrate (FIG. 59D) as ananti-coagulant and stored at different conditions. The cells were storedat room temperature (open circle), at 4° C. (open square), or at −80° C.(open triangle). Comparable WBC subset data were observed for freshsamples compared to samples treated with different coagulants. Further,the WBC subset data for samples stored at room temperature compared tosamples stored at 4° C. and −80° C. were observed to be comparable.

DETAILED DESCRIPTION OF THE INVENTION

A model of hematopoiesis includes an early restriction point at whichmultipotent progenitor cells become committed to either lymphoid ormyeloid lineages. The standard methods of distinguishing immune celllineages are inadequate for fully distinguishing lineage commitment andthe process of hematopoiesis.

Epigenetics refers to heritable control of gene expression that occurswithout changing the sequence of DNA. Chromatin packaging is a mechanismof epigenetic gene regulation which has been implicated in cell lineagecommitment and lineage-specific gene expression. Transcriptionallyinactive, or silenced, heterochromatin is more tightly packaged aroundhistone proteins than transcriptionally active euchromatin due todifferences in DNA methylation patterns and post-translational histonemodifications. Due to its accessibility for measurement, DNA methylationis a marker of chromatin packaging. DNA methylation is largely confinedto cytosine residues in CpG dinucleotides which, though underrepresentedin the genome, are frequently found in high concentrations called CpGislands. Less methylated CpG islands are highly associated withtranscriptional activity and subsequent gene expression, and moremethylated CpG islands are highly associated with transcriptionalinactivity and gene silencing. Methylation of CpG dinucleotides causeschromatin to become more compact and inaccessible to transcriptionmachinery by moving histones and altering the organization of chromatinand nucleosomes. (Christensen, B. C., et al. 2009, PLoS Genet. 5,e1000602; Schmidl, C., et al 2009, Genome Res 19, 1165-1174).

In some instances, the overall balance of leukocyte subclasses incirculation or in tissue most prominently influences pathogenesis. Forexample, incipient cancer cells are recognized and eliminated bycytotoxic T cells (CTLs) and natural killer (NK) cells, andtumorigenesis is also promoted by certain other inflammatory cells,including B-lymphocytes, mast cells, neutrophils, regulatory T cells(Tregs), and others. These cells have been shown to promoteangiogenesis, tumor cell proliferation, tissue invasion and metastasis(Hanahan and Weinberg 2011, Cell, 144, 646-74; Ostrand-Rosenberg, 2008,Curr Opin Genet Dev, 18, 11-18). Likewise, higher levels of NK cells andCTLs circulating in the blood and residing in adipose tissues areassociated with lower incidence of metabolic diseases such as type IIdiabetes (Lynch et al., 2009, Obesity, 17, 601-5), and higher levels ofMl macrophages in adipose tissue can induce inflammation and insulinresistance (Anderson et al., 2011, Curr Opin Lipidol. 21, 172-177).Methods of quantifying the composition of lymphocyte populations can beinformative regarding the underlying immuno-biology of disease states aswell as the immune response to chronic medical conditions. (Chua et al.,2011, Brit Cancer 104, 1288-1295).

The methods described herein provide a measurement of individual humanor animal immune cell numbers or immune cell ratios and in diversebiologic media without the requirement for viable cells or cell sortingor the use of any antibodies or protein markers. The methods areapplicable to blood including samples of unsorted blood that is fresh,or is frozen or unfrozen anticoagulant treated peripheral whole blood,finger stick blood, non-anticoagulant treated whole blood, blood clots,isolated mononuclear cells, huffy coat, archival Guthrie card neonatalblood, and to a sample that is a spot, fresh, frozen or is from a tumorsuch as a formalin-fixed tumor biopsy, and to urine sediment, CNS fluid,fat or other tissue biopsy.

In one embodiment the methods described herein are provided asdiagnostic kits for testing laboratories in the form of immune cellspecific detection reagents, premixed and optimized plate formattedmultiplex assays for immune profiling compatible with specificinstrument platforms, applications for in vitro diagnostics of blood,CNS, urine or bronchoalveolar lavage and point of care blood samplingkits for mail-in immune testing and immune monitoring.

The simplified DNA based immuno-diagnostic approach provided herein usessamples that are much smaller volumes of blood than required for earliermethods and that require no processing. These samples can be simply‘spotted’ onto a solid phase carrier and transported through the mail ordelivered using courier.

In another embodiment, the methods described include development ofsoftware that can process the output data of immune specific methylationassays to create immune parameter reports by comparison to differentreference and control values.

In an alternate embodiment the methods herein describe a discoveryplatform which is a bioinformatic integration of empirically derivedgenome wide methylation analyses with publically available differentialgene expression analyses. The merged datasets are then sorted to producecandidates for further examination. The discovery platform is useful todiscover clinically useful gene biomarkers.

The methods described herein include a proof-of-principal test of thediscovery platform. For the test the goal set was to discover a gene orgene set that provides a marker of CD3+ T cells. The method isapplicable to finding a biomarker for any cell. Specifically, theplatform identifies gene regions that are ‘demethylated’ within thetarget cell population (CD3+ T cell) and completely methylated innon-target cells.

To accomplish this discovery phase for the set goal, normal immune cellsfrom the peripheral blood of different individuals was isolated usingflow cytometry antibody based cell sorting. Following purification eachof the immune cell subtypes was subjected to methylation discoveryanalysis using the Infinium genome-wide methylation platform. (Infinium®HumanMethylation27 Beadchip Microarray, developed by Illumina®, Inc.,San Diego, Calif.). The DNA methylation data was then merged withexisting gene expression data. Candidates that have high potential todiscriminate CD3±T cells from non-T cells were then further analyzedwith two different methylation validation methods (pyrosequencing andquantitative methylation specific PCR i.e. MethylLight). Finally, aquantitative calibration curve was developed by diluting known andmeasured numbers of CD3+ T cells into a background matrix of fullymethylated lymphocyte DNA. The latter procedure reconstructs theconditions of detection that are present in differentiating CD3+ T cellsfrom a mixture of cells in a complex biological sample.

The methods described herein use individual samples of sorted, normal,human, peripheral blood leukocytes shown in Table 15, Example 13,purchased from AllCells®, LLC (Emeryville, Calif.). These leukocyteswere sorted in a column containing antibody-conjugated magnetic beadsthrough a combination of positive and negative selection. DNA from theleukocytes was extracted according to manufacturer's protocol using theDNeasy Blood & Tissue kit (Qiagen), and subjected to Bisulfiteconversion by treatment with sodium bisulfite using the EZ DNAMethylation Kit (Zymo) following the manufacturer's protocol, therebyconverting unmethylated cytosine residues to uracil and leavingmethylated cytosine residues intact. DNA methylation is measured using aDNA methylation microarray as described in Example 13.

Huehn et al. (U.S. patent publication number 2007/0269823 A1) describesa method for identifying FoxP3-positive regulatory T cells by analyzingthe methylation status of CpG positions in the FOXP3 gene, and furtherdescribes a method for diagnosing immune status of a mammal by measuringamounts of regulatory T cells thus identified. CpG methylation analysisof FoxP3 gene is also used to determine the quality of in vitrogenerated T regulatory cells and for identifying chemical or biologicalsubstances that modulate the expression of the FOXP3 gene in T cells.Specific CpG positions in the mouse FoxP3 gene are identified foranalyzing methylation status and primers for amplifying mouse and humanCpG dense regions in FOXP3 gene are described.

Olek (U.S. patent publication number 2007/0243161 A1) describes a methodfor pan-cancer diagnostics involving identification of an amount and/orproportion of stable regulatory T cells in a patient suspected of havingcancer by analyzing methylation status of CpG positions in the FOXP3and/or camta1 genes. Increased amount/proportion of stable regulatory Tcells in the patient is indicative of an unspecified cancerous disease.A method of treating cancer by reducing the amount or proportion ofstable regulatory T cells and a method for diagnosing survival of acancer patient by measuring T regulatory cell amounts and/or proportionsin patients suspected of having cancer using CpG methylation analysis ofFoxP3 and/or camta1 genes are described. Increased amounts and/orproportions of stable regulatory T cells in the cancer patient isindicative of a shorter survival.

Olek et al. (International publication number WO 2010/069499 A2)describes a method of identifying T-lymphocytes, in particular CD3+CD4+and/or CD3+CD8+ cells by analyzing the methylation status of CpGpositions in one or more of genes for CD3 multi-protein complex CD3 γ,-δ and -ε, or in other genes. Demethylation is indicative of a CD3+cell. Olek further describes methods for methylation analysis of CpGpositions in CD4+ and/or CD8+ genes, in particular CD8 beta gene, or inother genes, and for determining immune status based on T-lymphocytesidentified by methylation analyses, and for monitoring amounts ofT-lymphocytes in response to chemical and/or biological substanceexposure, in particular CD4+ or CD8+ T lymphocytes.

Shen-Orr et al. 2010, Nature Methods Vol. 7:4, 287-289 describes acell-type specific significance analysis of microarrays for analyzingdifferential gene expression for each cell type in a biological samplefrom microarray data and relative cell type frequencies. In Shen-Orr'smethod relative abundance of each cell type in a mix tissue sample isfirst quantified, and this information is used in combination withmicroarray gene expression data to deconvolve and compare celltype-specific average expression profiles for groups of mixed tissuesamples.

Abbas et al. 2009, PLoS One Vol. 4:7 e6098 describes deconvolution ofmicroarray gene expression data to characterize proportions of cells ina tissue, and further identifies cellular activation patterns inSystematic Lupus Erythematosus.

A method similar to regression calibration is provided herein fordetermining changes in the distribution of white blood cells betweendifferent subpopulations (e.g. cases and controls) using DNA methylationsignatures or DNA methylation profiles, in combination with an externalvalidation set having methylation signatures from purified leukocytesamples. The method is demonstrated with Head and Neck Squamous CellCarcinoma (HNSCC) cases and matched controls, showing that DNAmethylation signatures register known changes in CD4+ and granulocytepopulations.

Use of DMRs as markers of immune cell identity is employed herein with ahigh density methylation platform, and a set of analytical tools forestimating the proportions of immune cells in unfractionated whole bloodto determine the DNA methylation signature of each of the principalimmune components of whole blood (B cells, granulocytes, monocytes, NKcells, and T cells subsets). A form of regression calibration wasdetermined that considers a methylation signature as a high-dimensionalmultivariate surrogate for the distribution of white blood cells. Thisdistribution was used to predict or model disease states. As asurrogate, the DNA methylation signature was assumed to be a highlycorrelated measure of leukocyte distribution, and thus fits into theframework of measurement error models, in which the use of a noisysurrogate marker to investigate an association with a disease outcome ofinterest results in biased estimates, unless internal or externalvalidation data are obtained to “calibrate” the model and correct thebias (Carroll et al., 2006, Measurement error in nonlinear models.Chapman & Hall, Boca Raton, Fla., 2^(nd) edition).

In this case, the problem was complicated by the extremely highdimension of the surrogate. Measurement error problems are formulated asa set of relationships between z, the disease outcome (e.g. case/controlstatus), ω, the gold standard (e.g. leukocyte distribution), and y, thesurrogate (e.g. DNA methylation). The concept E(z|ω), was difficult toestimate due to the cost or logistical complications involved inobtaining w in a large number of samples. Sufficient data for modelingE(z|y)=ƒ(y) were collected, which provides information about E(z|ω)through the (often imperfect) association E(y|ω)=g(ω), which is inferredfrom an external validation sample (Thurston et al., 2003, J Stat PlanInf, 113, 527-34; Carroll et al., 2006, Measurement error in nonlinearmodels. Chapman & Hall, Boca Raton, Fla., 2^(nd) edition). An additionalassumption was that E(z|ω,y)=E(z|ω), i.e. the surrogate provides noinformation about disease above and beyond the standard for which itserves as a surrogate. The high-dimensional nature of y renders ƒ(y)difficult to formulate. Although multivariate methods of measurementerror correction exist, even in a high-dimensional context (e.g. Li andYin, 2007, Ann Stat, 35, 2143-72) an explicit specification of ƒ(y) isimportant, which becomes unwieldy as each component of y contributes asmall amount of information about z, and both dimension-reductionstrategies and constrained regression strategies entail substantial lossof information. In the present context, specification of y=ƒ(z) isnatural and straightforward. Consequently, a reversal of the modelingequation is here provided, formulating y=f(z) as part of the modelingstrategy, and linking the linear functions ƒ and g in a manner thatadmits the estimation of ω. In methods herein several major sources ofpossible bias were identified and methods provided for control andsubjection to sensitivity analysis of the sources of the bias.

Examples herein include methods for an estimation technique, theoreticaltreatment of bias, and a demonstration of the approach through anapplication to whole blood specimens collected in an example of head andneck squamous cell carcinoma (HNSCC). See FIG. 3. Also provided aremethods for a sensitivity analysis, demonstrating the impact of possiblebiases. Simulation study results are shown in examples herein based onthe biology in the samples used.

Examples 1-3 herein show a method for determining changes indistribution of white blood cells between different subpopulations (e.g.cases and controls) from DNA methylation signatures, assuming anexternal validation set consisting of methylation signatures frompurified white blood cell (WBC) samples exists. Examples 4, 10 and 11herein demonstrate the methodology using a data set of HNSCC cases andmatched controls, inferring from DNA methylation assays alone knownchanges in CD4+ and granulocyte populations between cases and controlsand change in CD4+ populations due to aging. Using previous methods flowcytometry would have been necessary to obtain the same results. A methodfor assessing the sensitivity of the magnitude estimates to possiblebiases is also provided. Example 12 validates the method throughsimulation.

Methods are provide herein for determining changes in the distributionof white blood cell types between different human populations (e.g.cases and controls) using DNA methylation signatures; by using anexternal validation set having methylation profiles from purified whiteblood cell components. DNA methylation in peripheral blood wasaccordingly shown to be a biomarker for clinical and epidemiologicalinvestigation. Studies have attempted to distinguish cancer cases fromcontrols using whole peripheral blood assayed with DNA methylationarrays, including ovarian (Teschendorff et al., 2009, PLoS ONE 4,e8274), bladder (Marsit et al., 2011, J Clin Oncol 29, 1133-1139), andpancreatic (Pedersen et al., 2011, PLoS ONE 6, e18223) cancers. Althoughthese studies have demonstrated discrimination of cases from controls,sound evidence for a biological mechanism has been elusive. Presumably,disease associated alterations in blood methylation have severaletiological components driven by endogenous genetic, environmental anddisease specific factors. From known developmental associateddifferences in DNA methylation among specific blood cell types, changesin the distributions of blood cell types alone could account for diseaseassociated DNA methylation. The many diverse types of immune cells inblood make this issue highly complex and problematic to tackle usingsingle cell type assays. Therefore, it is important for the developmentof this new avenue of biomarker research to delineate effects due to theimmune cell distribution itself from other “non cell type” alterationsin DNA methylation. The differences among human populations attributedto cell distributions are termed “immunologically mediated”.

Immunological explanations for differences in mRNA profiles betweencases and controls have been proposed, e.g. Showe et al., 2009, CancerRes 69: 9202-10 and Kossenkov et al., 2011, Clin Cancer Res 17: 5867-77.The statistical principles described in the method herein apply to mRNAexpression profiles and an appropriate validation set S₀ based on mRNAexpression arrays. Little to no modification of mathematical expressionsand computer code is necessary to apply the statistical principlesdescribed in the method herein to analysis of mRNA expression profiles.Under the assumption that the upstream epigenetic control mechanisms aremore biologically stable, less variability in measurement of DNAmethylation is expected compared with measurement of mRNA expression.

In the methods herein, a solution to partition this component ofvariation in methylation from other determinants employs multivariateanalytic tools including regression coefficients, associated inference,and coefficients of determination measures. These tools were used toevaluate whether the observed DNA methylation differences were due to animmunologically mediated response. Prior measurement error formulations(Thurston et al., 2003, J Stat Plan Inf; 113, 527-34; Li and Yin, 2007,Ann Stat, 35, 2143-2172) require specification of a logistic regressionmodel for case/control status, conditional on DNA methylation signature,a computationally difficult task that is vulnerable to modelmis-specifications. A reverse formulation was used herein that naturallymodels the relationship of DNA methylation conditional on knownphenotypes. The formulation respects the protocol (DNA methylation assaydata collected after sampling from phenotype groups). Other strategiesto formulate errors were found to be unsuccessful. For example, thestrategy utilizing Expectation-Maxinlization (EM) algorithm to integrateover the missing data ω (Little and Rubin, 2002, Statistical Analysiswith Missing Data. Wiley, Hoboken, N.J., 2^(nd) edition) is outside themeasurement error literature and within the larger missing-dataliterature. However, by design, the distribution of ω variedsubstantially between the data sets S₀ and S₁, severely complicating theapproach, with side-effect of introducing feedback from S₁ to S₀,contaminating the gold-standard status of S₀. Another alternative thatwas found to be unsuccessful was the simpler approach of an empiricalBayes procedure, similar to existing mixture-model approaches (Koestleret al., 2010, Bioinformatics, 26, 2578-2585). However, difficulty inspecifying the distribution of ξ rendered this approach untenable, andin a separate simulation, attempts to impute ω among S₁ samples usingparameters obtained from S₀ samples resulted in extremely biasedestimates of ω.

Examples herein show that group level comparisons of blood cell DNAmethylation revealed significant immune alterations. Methods forindividual level immune cell profiling are applicable also, sincemethods herein are useful also to clinical and detailed analyticalepidemiologic applications that examine individual risk factorinformation. When z_(1i) involves an orthogonal (e.g. one-way ANOVA)parameterization and ordinary least squares (OLS) is used to obtain B₁,then equation 5 (Example 3) herein reduces to simple expressionsinvolving the projected quantities ω_(i)=y_(1i)B₀(B₀B₀)⁻¹. Forexploratory purposes, projections ω_(i) serve as estimates of individualprofiles. There is interest in minor immune cell fractions and theirrole in disease, though the signal strength of cell types comprising <5%of the total white cell compartment is difficult to quantitate. Examplesof such cell types include the regulatory T cell or NK cell fractions,which are implicated in autoimmune and malignant diseases. Optimizationof platforms for technical sensitivity to minor subtypes combined withstatistical optimization of signature recognition are needed to enhancethe approach for testing highly targeted immune hypotheses.

In addition to group level comparisons of blood cell DNA methylation,immune cell profiling at the individual level is important for examiningindividual risk factors in clinical and detailed analyticalepidemiologic applications. As shown in Examples herein, individualimmune profiles are theoretically achievable and require extensivevalidation with a wide array of mixture combinations.

The methods herein have potentially far reaching implications for rapid,simple and complete assessment of the composition of human white bloodcell populations, i.e. the immune profile. Currently, assessment of thecellular composition of peripheral blood cannot be accomplished withoutthe use of freshly drawn venous blood that is immediately prepared in aspecially equipped laboratory. A complete assessment of the entireimmune profile requires extensive flow cytometric measurements based onprotein epitopes on leukocyte membranes that distinguishes subtypes ofimmune cells that are either too rare or too similar in appearance to bedistinguished using simple microscopic approaches. In particular, flowcytometry is limited by the following: cells must be separated,requiring large volumes of fresh cells; detection can be accomplishedonly by the fluorescent antibody tags available, which require expensivetechnology to read; the outer cell membrane must be intact, mandatinglimited utility in many instances.

In contrast, using the methods herein, the application oflabor-intensive or expensive steps is required only in the constructionof the validation set S₀, which need only be developed once. Once S₀ isavailable, subsequent interrogation is based on the chemically stableCpG methylation of DNA. Thus the methods herein obviate the need forfresh blood and the preservation of labile protein epitopes. The methodsherein are able to also simultaneously assess the individual componentsof the peripheral blood using a highly multiplexed molecular platformand therefore logistically straightforward. Furthermore, the statisticalmethodology used here is implemented easily with the instrumental outputof the methylation arrays, which simplifies the interpretation of theimmune profile data from the operator's point of view. The methodsherein are immediately deployed in a research framework to costeffectively assess human immune profiles (in fresh or archival samples),to explore the potential of the immune profiles to function asbiomarkers, and to address key questions regarding disease pathogenesis.Furthermore, the approach used in the methods herein is readily suitedfor rapid translation to a broad base of clinical applications such asdisease monitoring, diagnosis, prognosis, and response to therapy.

The methods herein are applied to tumor biopsies for immunecharacterization of cancer patients. Other notable applications existincluding the application of the test to urine sediments in patientswith autoimmune and diabetic kidney disease or in patients undergoingkidney transplantation. Positive detection of T cells in urine sedimentis indicative of immune activation and potential kidney diseaseprogression or acute rejection in the context of kidney transplantation.

Populations of blood lymphocytes can be distinguished morphologically onthe basis of size and the presence of a granular cytoplasm.

Small lymphocytes, including subsets of T- and B cells, are responsiblefor adaptive immune responses. Sublineages of small lymphocytes aremorphologically indistinguishable and are distinguished by cell surfacereceptors and cellular function. B cells are typically distinguished byexpression of the surface molecule CD 19. They express immunoglobulins,which are surface receptors for pathogens. In addition, B cells arecapable of further differentiating into effector cells called plasmacells. (Parham, P. The Immune System, Garland Science, New York, N.Y.,2005). Differentiated T cells exhibit a complex of surface moleculeswhich function as antigen receptors, referred to as the T cell receptor(TCR) complex. This complex includes the TCR α plus β, or γ plus δantigen recognition chains, which are associated with invariant chainsubunits CD3γ, δ, ε, and ζ. (Zhang, Z., et al. 2007, Blood 109,4328-4335). In general, T cells are distinguished from other celllineages by expression of CD3 molecules on the cell surface. The genesthat encode CD3 γ, δ, ε, and ζ subunits are CD3G, CD3D, CD3E and CD3Zrespectively. The former three genes are tightly clustered on chromosome11, whereas CD3Z is located on chromosome 1. Differentiated T cells arefurther divided into two lineages depending on their expression ofeither CD4 or CD8. The main function of CD8+ T cells, also known ascytotoxic T cells, is to kill infected and transformed cells. The mainfunction of CD4+ T cells is to help other immune cells respondappropriately to sources of infection or malignancy There are severalsubsets of CD4+ T cells, including Th1, Th2, Th17 and regulatory Tcells. (Parham, P. The Immune System, Garland Science, New York, N.Y.,2005). Regulatory T cells suppress an immune response by influencing theactivity of other cell types. They act primarily in the periphery onmature lymphocytes that have exited the main lymphoid tissues and serveas a means of preventing autoimmunity during protective immuneresponses. Exemplary regulatory T cells are thymus-derivedCD4+CD25+Foxp3+ T cells, commonly referred to as Tregs. (Zou, W. 2006,Nat Rev Immunol 6, 295-307). These cells primarily function to maintainperipheral self-tolerance. (Cesana, G. C., et al., 2006, J Clin Oncol24, 1169-1177). Forkhead Box P3 (FOXP3), a transcription factorexpressed by Tregs, is an important developmental and functional factorthat regulates Treg immunosuppressive functions. (Janson, P. C.,Winerdal, M. E. & Winqvist, O. 2009, Biochim Biophys Acta 1790, 906-919;Zou, W. 2006, Nat Rev Immunol 6, 295-307).

Natural killer (NK) cells are large CD56+ lymphocytes with a granularcytoplasm. They enter infected or malignant tissue to kill damaged cellsand secrete cytokines aimed at preventing the spread of disease to othercells or tissues. Thus, NK cells act as effector cells of innateimmunity. A subset of CD56+ NK cells that express CD3 surface moleculesare NKT cells.

To determine if distinct methylation profiles are indeed associated withleukocyte lineages, statistical clustering of methylation patterns wasperformed using a modified model-based form of unsupervised clusteringknown as recursively partitioned mixture modeling (RPMM). (Houseman, E.A., et al. 2008, BMC Bioinformatics, 2008, 9, 365).

A locus by locus comparison was performed in which putative leukocyteDMRs were identified from Infinium data in SAS version 9.1 using a macrofor locus-by-locus linear modeling that adjusts for control probe andbeadchip plate. Infinium beta values for Group 1 leukocyte samples werecompared to Infinium beta values for Group 2 leukocyte samples, in whichgroup membership for each phase of the comparison is shown in Table 1.

TABLE 1 Locus by locus comparison groups Group 1 Leukocytes Group 2Leukocytes Phase I CD3+, Pan-T, CD4, NK, B, Mono, Gran, Neut Treg, CD8Phase II NK Pan-T, CD4, Treg, CD8, B, Mono, Gran, Neut Phase III CD8CD4, Treg, NK, B, Mono, Gran, Neut

Resultant t-values from each comparison were converted to p-values in Rversion 2.11.1 of Illumina's software which provides convenientmechanisms for loading and analyzing the results of methylation status,and for quality control and basic visualization tasks.

False discovery rate estimation and Q-values were computed by theQ-value package in R to adjust for multiple comparisons. (Significancewas characterized as Q≦0.05.)

For significant CpG loci (Q≦0.05), a negative t-value indicates thelocus putatively represents a DMR that is unmethylated in group 1leukocyte lineage(s) and methylated in group 2 leukocyte lineage(s).Conversely, a positive t-value indicates that the locus putativelyrepresents a DMR that is methylated in group 1 leukocyte lineages andunmethylated in group 2 leukocyte lineages. A DMR that is unmethylatedin the leukocyte lineage(s) of interest and methylated in otherleukocyte lineages would make the best epigenetic biomarker, sinceunmethylation is associated with transcriptional activity whereasmethylation is associated with transcriptional silencing. Therefore,significant CpG loci exhibiting negative t-values are preferred.

In the methods herein, results of locus by locus comparisons were mergedwith cell type specific gene expression data. (Palmer et al., 2006, BMCGenomics 7, 115; Du et al., 2006, Genomics 87, 693-703; and Hashimoto etal., 2003, Blood 101, 3509-3513) to identify putative DMRs that are ingenes associated with altered expression by Group 1 leukocyte lineagescompared to Group 2 leukocyte lineages. An exemplary candidateepigenetic biomarker of a specific leukocyte lineage is an unmethylatedregion of a gene that is highly expressed by the leukocyte lineage, andnot expressed by other cell types such as lineage-specific surfacemolecules, obligate differentiation proteins, and secreted factors. Afurther candidate is a methylated region of a gene that is not expressedby the leukocyte lineage and is expressed by other cell types. Withoutbeing limited by any theory or mechanism of action scenarios correlatewith chromatin packaging, so that differential DNA methylation plays alarge role in regulating leukocyte lineage specific expression of thegene. If no leukocyte lineage specific difference in expression of thegene containing a putative DMR were observed, other modes of generegulation such as activators, repressors, and enhancers overshadow therole of chromatin packaging in regulating expression of the gene.Alternatively, such a gene is expressed in a temporally orenvironmentally specific manner that was not elucidated by the geneexpression candidate data. Such a putative DMR would not be an idealtarget to explore as an epigenetic biomarker of that leukocyte lineage.

In the methods described herein DMR validation is performed for eachputative DMR identified from array data using bisulfite pyrosequencingand/or MethyLight quantitative real time PCR assays that measure DNAmethylation of the gene region in sorted human leukocyte samples shownin Table 15, Example 13. Bisulfite pyrosequencing assays were designedusing Pyromark Assay Design 2.0 (Qiagen), and carried out on a PyromarkMD pyrosequencer running Pyromark qCpG software (Qiagen).Oligonucleotide primers were obtained from Invitrogen™ by LifeTechnologies™. The gene region of interest were PCR amplified frombisulfite converted DNA using a biotinylated reverse primer and anunlabelled forward primer. The biotinylated PCR product was complexedwith sequencing primers that anneal upstream from the target region, andwas then incubated with enzymes and substrates. Then, dNTPs weredispensed in a specific order and light emitted with the incorporationof each nucleotide is measured with a CCD camera. Methylation wasquantified by calculating the ratio of cytosine (methylated) to thymine(unmethylated) at each CpG locus.

In the methods described herein methylation status of specific generegions was calculated using MethyLight according to the protocoldescribed by Campan et al. 2009, Methods Mol Biol 507, 325-337, with thefollowing modifications: C-less primers and probe were used to determinetotal DNA input for each sample and control reference rather than ALU-C4primers and probe. To measure unmethylation, control unmethylated DNAwas used as a reference, generating a percent unmethylated referencevalue which is subsequently converted into percent methylation. Realtime PCR primers and flourescent (major groove binding) MGB probes wereobtained from Applied Biosystems (Foster City, Calif.). TaqMan®Universal PCR Mastermix, no AmpErase® UNG was obtained from AppliedBiosystems, manufactured by Roche (Branchburg, N.J.). Quantitative, realtime PCR reactions were performed with Applied Biosystems 7300 Real TimePCR System using Applied Biosystems 7300 system sequence detectionsoftware version 1.4.0.25©2001-2006.

In the methods herein, a putative DMR identified as being unmethylatedin group 1 leukocytes based on Infinium methylation data was shown usingbisulfite pyrosequencing or MethyLight® qPCR to be unmethylated in group1 leukocytes and methylated in group 2 leukocytes and the DMR wasconfirmed as an unmethylated epigenetic biomarker specific to the group1 leukocyte lineage(s). A putative DMR shown using bisulfitepyrosequencing or MethyLight® qPCR to be unmethylated in group 1leukocytes and in some group 2 leukocytes, was not confirmed as anepigenetic biomarker specific to the group 1 leukocyte lineage(s).Instead that DMR represents an epigenetic biomarker of several differenthuman leukocyte lineages including the group 1 lineage(s). A DMR that ispartially unmethylated by bisulfite pyrosequencing or MethyLight® qPCRin group 1 leukocytes and methylated in group 2 leukocytes, is a weakepigenetic biomarker of the group 1 leukocyte lineage(s). That DMR isheterogeneously unmethylated in group 1 leukocytes and is homogeneouslymethylated in group 2 leukocytes and is therefore not useful fordistinguishing group 1 from group 2 leukocyte lineages.

If Infinium data suggested that a CpG locus represents a DMR specific togroup 1 leukocytes, and bisulfite pyrosequencing or MethyLight qPCR didnot find a difference in DNA methylation in that region between group 1and group 2 leukocyte samples, the region was not considered a DMR thatwould serve as an epigenetic biomarker of the group 1 leukocytelineage(s).

These discovery platform criteria successfully identified a uniqueheretofore unknown sequence of genomic DNA that is specifically markedby CpG demethylation in CD3 positive T cells, not in other hematopoieticperipheral blood cells (FIG. 10B). In examples herein it is furthershown the DNA methylation status of this region in the promoter of CD3Zgene in sorted human peripheral blood leukocytes measured by MethyLight®qPCR confirms that the identified genomic sequence is an immune celltype specific differentially methylated region that is a useful markerto quantify CD3+ T cells in biological specimens such as whole orseparated blood and other tissues.

Gliomas are a histologically diverse cancer with few established riskfactors and poor prognoses (Kleihues et al. 1993, Brain Pathol 3(3):255-68; Ohgaki and Kleihues 2005, Acta Neuropathol 109(1): 93-108; Louiset al. 2007, Acta Neuropathol 114(2): 97-109; Ohgaki, and Kleihues 2007,Am J Pathol 170(5): 1445-53). However, immune factors are associatedwith increased glioma risk and are also thought to play a role inpatient outcomes (Wiemels et al. 2009, Int J. Cancer. 2009 Aug. 1;125(3):680-7; Yang et al. 2010, J Clin Neurosci 17(11): 1381-5).Patients with glioblastoma multiforme (GBM) exhibit abnormalities(McVicar et al., 1992, J Neurosurg 76(2): 251-60; Ashkenazi et al. 1997,Neuroimmunomodulation 4(1): 49-56) of T cell response associated withpronounced reductions in T cell numbers in peripheral blood includingthe suppressive regulatory T cells (Tregs) (Fecci, et al., 2006, CancerRes 66(6): 3294-302). Despite low T cell and Treg counts, the ratio ofTregs to T cells is clinically relevant in immunosuppression. Currentlythere is no validated method to quantify this ratio. The quantificationof immunosuppression is envisioned herein to help also in characterizingpatient tumors. An immunosuppressive environment in glioma is alsosuggested by the accumulation of tumor infiltrating lymphocytes (TILs)displaying markers of Tregs, (i.e. cell membrane CD4 and CD25 andintracellular staining of the FOXP3 protein).

Epigenetic markers involving the demethylation of the FOXP3 gene havebeen determined to be the most specific marker of stable Tregs. (Baronet al., 2007, Eur J Immunol 37(9): 2378-89; Floess et al., 2007 PLoSBiol 5(2): e38; Polansky et al., 2008, Eur J Immunol 38(6): 1654-63). Asdescribed in examples herein, by combining information about the FOXP3differentially methylated region (DMR) with methylation specificquantitative PCR (MS-qPCR) highly sensitive and accurate counts of Tregsin blood and tissues were obtained. Such DNA-based methods tointerrogate specific populations of T cell subsets are far lessexpensive than flow-cytometry and can be applied to archival specimens.Examples herein show that the DMR marker for CD3+ T cells identifiedherein is used alone or in conjunction with the previously describedTreg DMR marker.

A quantitative assay for CD3+ T cells based on the demethylation of thepromoter of a component of the T cell receptor complex: CD3Z (CD247) isalso described herein. Examples herein show the validity of CD3Zdemethylation as a CD3+ T cell marker and illustrate its application inpatients with glioma that demonstrate the high discriminating value ofCD3Z demethylation in glioma case-control subject comparisons,histopathological characterization of tumors and patient prognosis.

An understanding of the role played by an altered immune response inetiology facilitates development of more effective therapies andprognostic indicators. Epidemiological studies implicate atopic immunealterations in glioma risk (Wrensch et al., 2005, Am J Epidemiol161(10): 929-38; Schwartzbaum et al., 2010, Carcinogenesis 31(10):1770-7). Immune suppression and abnormalities in T cells in gliomapatients may prevent antitumor immunity and poses barriers to effectiveimmunotherapeutic strategies (Grauer et al., 2007, Int J Cancer 121(1):95-105; Sonabend et al., 2008, Anticancer Res 28(2B): 1143-50). Dataobtained using novel T cell epigenetic assays described in examplesherein demonstrate dramatic decreases in CD3+ T cells and Tregs inperipheral blood from GBM patients. The copy numbers of demethylatedCD3Z and FOXP3, as a percent of total leukocyte copies, were observed tobe reduced about two-fold in GBM patients, which was highlystatistically significant.

Validation studies herein support the notion that the CD3Z MS-qPCR assayusing unprocessed archival whole blood is an accurate reflection of Tcells as measured by conventional flow cytometry. Previous studies havevalidated the FOXP3 demethylation assay as a measure of Tregs in bloodand tissues (Baron et al., 2007, Eur J Immunol 37(9): 2378-89). Currentsteroid use (dexamethasone), temozolomide and radiation exposures aspossible factors in these effects among cases were investigated but nosignificant associations of any factor with these T cell alterations wasfound. The methods described in examples herein that delineate T cellsubsets from DNA facilitate immune cell analyses using blood specimensthat have been archived in cohort populations with long-term gliomafollow-up data. Nested case control studies within large epidemiologiccohorts are now feasible as a result, allowing for the first time, totest whether T cell and Treg abnormalities precede the diagnosis ofglioma.

The balance of suppressive Tregs to total T cells in peripheral bloodhas been reported to be shifted towards greater suppression in GBMpatients and other types of cancer (Beyer and Schultze, 2006, Blood108(3): 804-11). Ratio of Tregs/T cells in association with cigarettesmoking was examined herein. An association of current smoking withhigher Treg/T cell ratios was observed. There is strong evidence thatcigarette smoke exposure leads to the accumulation of Tregs inrespiratory airways in mice (Brandsma et al., 2008, Respir Res 9: 17)and humans (Smyth et al., 2007, Chest 132(1): 156-63) as well as in thegut epithelium of exposed mice (Verschuere et al., 2011 Lab Invest.91(7):1056-67). Treg/T cell ratios were herein observed to be higher incurrent smokers versus former smokers (FIG. 16). It was subsequentlyconfirmed in an independent population that current but not formercigarette smoking exhibit higher Treg/T cell ratios. Results hereinillustrate the need for examination of patient characteristics toinclude cigarette smoking in diseases that affect Treg levels. Newepigenetic methods described herein are useful in promoting these typesof studies.

Similar to many types of cancer CD4+ T helper cells and Tregs have beenshown to infiltrate the human glioma tumor microenvironment (Nishikawaand Sakaguchi, 2010, Int J Cancer 127(4): 759-67). In glioma studiesusing IHC to quantify T cells in FFPE preparations CD4+ T cell numberswere reported to increase with tumor grade, whereas CD8+ T cells appearin equal frequencies across glioma grades (Heimberger et al., 2008, ClinCancer Res 14(16): 5166-72). Results herein indicate increased CD3Zdemethylated cells according to grade (FIG. 17). Immunohistochemical IHCanalysis herein showed that mostly these cells were CD8+ cells with veryfew CD4+ cells. Examples herein also show that ependymal tumor cells andsome significant fraction of grade II Oligodendrogliomas (OD) andAstrocytomas (AS) tumors contain significant numbers of I cells andTregs (FIG. 21). As progression of lower grade to higher grade braintumors is a common and serious clinical problem results herein show thatepigenetic analyses are useful for characterizing low grade OD and AStumors as well as Ependymomas (EP). Compared to previous reports (ElAndaloussi and Lesniak, 2006, Neuro Oncol 8(3): 234-43; El Andaloussiand Lesniak, 2007, J Neurooncol 83(2): 145-52; Heimberger et al., 2008,Clin Cancer Res 14(16): 5166-72; Heimberger et al., 2008, Neuro Oncol10(1): 98-103) analysis herein using the MS-qPCR showed significantlyincreased ratio of Treg/CD3+ Tcells within glioma tumor tissues ofdifferent pathological grade (FIG. 17). Results herein showed also howthe ratio of Tregs/CD3+ Tcells increases with tumor grade in comparisonto blood. Thus, until the present results, there was no evidence of aspecific accumulation of Tregs in human brain tumors. The survival datain examples herein show significant associations of immune parameterswith patient survival (FIG. 22).

Without being limited by any theory or mechanism of action, observationsherein of a close linear relationship between flow cytometry of CD3+ Tcells and CD3Z demethylation that was identical among glioma cases andcontrols argues against a cancer related effect on CD3Z demethylationsuch as downregulation of CD3Z through a posttranslational effect onCD3Z proteins mediated by up regulation of lysosomal or proteasomaldegradation pathways. Another issue concerning the validity of CD3Zdemethylation as a CD3+ T cell marker in cancer tissues is that DNAdemethylation may take place in transformed cells and thus ‘mimic’ alymphocyte signal. To ascertain that the observed CD3Z demethylation wastaking place in CD3+ T cells and not due to DNA demethylation takingplace in transformed cells CD3Z and FOXP3 demethylation in brain tumorcells lines and in human GBM xenografts which cannot contain human Tcells was assessed. These samples contained non-detectable levels ofCD3Z or FOXP3 demethylation. Normal brain tissue was also uniformlydevoid of T cell signals, consistent with the specificity of the MS-qPCRin tumor as reflecting infiltration of immune cells. Some subtypes of NKcells (CD56^(dim)CD16^(bright)) utilize CD3Z in NK receptor signaling(Lanier, 2006, Trends Cell Biol 16(8): 388-90). The contribution of CD3Zexpressing and demethylated NK cells to the overall CD3Z demethylatedsignal in peripheral white blood cells is estimated to be very small.Furthermore, NK cells have not been observed in glioma tissues.

The fundamental innovation in the epigenetic analyses described hereinis a shift in immunodiagnostics away from proteomic-based approaches toone that is based on quantifying cell type specific DNA methylationevents. This new approach produces gains in versatility, sensitivity,feasibility and throughput compared with conventional flow cytometry orIHC and does so at a lower cost. The high chemical stability of cytosinemethylation marks within genomic DNA and the fact that differentiationwithin the immune system is tightly linked with gene specific DNAmethylation events makes quantification of immune cells throughepigenetic analyses a unique approach. The method combines the intrinsicchemical stability of DNA with the high sensitivity of qPCR methods.Automation and liquid robotic handling in processing and analysis addfurther to the power of the methodology and open avenues forinvestigations in the immunoepidemiology of glioma and many otherdiseases.

Methods herein show that blood-based DNA methylation signatures across acomplex cellular mixture of WBCs are useful for distinguishing solidtumor cancer cases in which there are well-defined immune-mediatedresponses and controls. As tumorigenesis elicits a distinct immuneresponse (Camilleri-Brot S et al., 2004, Ann Oncol 15:104-112; Wang Yetal., 2005, Am J Clin Pathol 124:392-401; Rui Let al., 2011 Nat Immunol12:933-940), the result is a hematopoietic shift in WBC populations,which can be precisely discerned by applying the unique epigeneticsignature of differing lineages. The aggregate methylation signature inblood that distinguishes cancer cases from controls corresponds to theepigenetic signatures that define leukocyte subtypes.

To understand the role of immune-mediated responses to tumorigenesis indefining distinct signatures of blood-based DNA methylation betweencancer cases and cancer-free controls in examples herein, the epigeneticlandscape of WBCs was obtained by identifying DMRs among leukocytesubtypes. This analysis revealed that the majority of the highestranking 50 leukocyte DMRs (Example 25) were differentially methylatedbetween disease cases and normal controls for HNSCC and ovarian cancers,with a smaller fraction differentially methylation between bladdercancer cases and controls. Among the eight overlapping CpG loci thatwere found to be significantly differentially methylated between cancercases and controls across the three data sets, the direction of therelationships was similar for HNSCC and ovarian cancer cases compared tocontrols. These findings show that HNSCC and ovarian cancer elicitsimilar shifts in leukocyte compositions in the hematopoietic system.

Of the seven overlapping DMRs (CD72, PACAP, FGD2, SLC22A18, GSTP1, NFE2,ASGR2) several are located within genes with either established oralleged involvement in immune differentiation or function, viz., CD72,PACAP and FGD2 (Kumanogoh and Kikutani, 2001, Trends Immunol 22:670-676;Parnes and Pan, 2000, Immunol Rev 176:75-85; Tan et al., 2009, Proc NatlAcad Sci 106:2012-2017; Huber C et al., 2008, J Biol Chem283:34002-34012). CD72, a member of the C-type lectin superfamily,negatively regulates B cell coreceptor signaling (Kumanogoh andKikutani, 2001) and has been shown to act as a unique inhibitoryreceptor on NK cells regulating cytokine production (Alcon V L et al.,2009, Eur J Immunol 39:826-832). Moreover, PACAP has been implicated asan intrinsic regulator of regulatory T cell abundance afterinflammation36 and FGD2 has been shown to play a role in leukocytesignaling and vesicle trafficking in cells specialized to presentantigen in the immune system (Huber C et al., 2008, J Biol Chem283:34002-34012).

In the model described herein containing the DNA methylation profile forthe highest ranking 50 leukocyte DMRs, patient age, gender, smokingstatus, smoking pack years, weekly alcohol consumption, and HPVserological status (Table 19, Example 13), HNSCC cancer was predictedwith high degree of sensitivity and specificity. Similarly highprediction performance was obtained for ovarian cancer using the DNAmethylation profile for the highest ranking ten leukocyte DMRs andpatient age group. Prediction performance for bladder cancer, based onthe methylation profile of the highest ranking 56 DMRs, patient age,gender, smoking status, smoking pack years, and family history ofbladder cancer, was lower than that observed for HNSCC and ovariancancer. One explanation for the differences in magnitude fordiscriminating cancer cases and controls among cancer types isunderlying differences in the magnitude of shift in leukocyte subtypes.Cancers characterized by a pronounced immunologic response such as HNSCCand ovarian cancer (Alhamarneh O et al., 2008, Head Neck 30:251-261;Zhang L et al., 2003, N Engl J Med 348:203-213; Tomsova M et al., 2008,Gynecol Oncol 108:415-420; Sato E et al., 2005, Proc Natl Acad Sci102:18538-18543; Curiel T J et al., 2004, Nat Med 10:942-949),correspond to more discernable shifts in leukocyte sub-population, thusresulting in greater discrimination of blood-derived DNA methylationusing leukocyte DMRs for these cancers compared to bladder cancer.

Substantial correlation was also obtained in methylation of the lociidentified via the semi-supervised recursively partitioned mixture model(SS-RPMM) analyses and the leukocyte DMRs that defined the methylationclasses discovered for the HNSCC and ovarian data sets. A diagramillustrating the analytic framework for SS-RPMM is provided in FIG. 32.The SS-RPMM25 procedure is specifically designed to constructmethylation classes that are based on an optimal number of informativefeatures (loci whose methylation is most strongly associated with cancercase/control status). The results demonstrate that the methylationclasses identified through SS-RPMM for the HNSCC and ovarian data setsare in large part due to systematic hematopoietic changes in WBCpopulations in response to tumorigenesis. The 56 leukocyte DMRs used inthe bladder profile analysis were less correlated with the nine CpG lociidentified via the previously reported SSRPMM analysis of this data set(Marsit C J et al., 2011, J Clin Oncol 29:1133-1139). Alternativebiological epigenetic mechanisms may be operative in bladder cancer inaddition to the epigenetic signatures characteristic of leukocytesubtypes, and contribute independently to the blood-derived differencesin DNA methylation between bladder cancer cases and controls.

Examples herein provide evidence that observed differences inblood-derived DNA methylation in cancer cases are largely explained bysystematic differences in the methylation signatures of leukocytesub-populations. These findings signify that different cancers elicit adiscernible, unique immune response evident in peripheral blood. Theseresults have important implications for research into the immunology ofcancer. Further, the approach of observing differences in blood derivedDNA methylation provides a completely novel tool for the study of theimmune profiles of diseases where only DNA can be accessed; that is,this approach has utility not only in cancer diagnostics andrisk-prediction, but can also be applied to future research (includingstored specimens) for any disease where the immune profile holds medicalinformation. The approach represents an extremely simple, yet trulypowerful and important new tool for medical research and may serve as acatalyst for future non-invasive disease diagnostics.

Natural killer (NK) cells are a key element of the innate immune systemimplicated in human cancer. To examine NK cell levels in archived bloodsamples from a study of human head and neck squamous cell carcinoma(HNSCC), a DNA-based quantification method described in methods hereinwas developed (Examples 27-36).

Head and neck squamous cell carcinoma (HNSCC) is strongly associatedwith alterations in the immune system and it is postulated thatprogression of HNSCC tumors is linked to immune evasion or failure ofthe immune system to fight the cancer (Duray A, et al., 2010, Clinical &developmental immunology, 2010:701657; Pries R, and Wollenberg B, 2006,Cytokine Growth Factor Rev, 17:141-6; Wulff S et al., 2009, Anticancerresearch, 29:3053-7; Kuss I et al., 2004. Clin Cancer Res, 10:3755-62;Kuss I et al., 2005, Adv Otorhinolaryngol, 62:161-72). Natural killer(NK) cells are of particular interest in the context of HNSCC and othercancers, since they are able to recognize and destroy pre-cancerous andmalignant cells (Kim R et al., 2007, Immunology, 121:1-14;Ostrand-Rosenberg S. 2008, Curr Opin Genet Dev, 18:11-8; Whiteside T L,2006, Cancer Treat Res, 130:103-24; Parham P. The Immune System. 2nd ed.New York, N.Y.: Garland Science; 2005). Natural killer cell infiltrationinto solid tumor tissue has been associated with improved survival instudies of many different types of cancer (Ishigami S et al., 2000Cancer, 88:577-83; Kondo E et al., 2003, Dig Surg, 20:445-51; Villegas FR et al., 2002, Lung Cancer 2002; 35:23-8). Immune suppression isfrequently seen in patients with head and neck cancer (Duray A, et al.,2010, Clinical & developmental immunology, 2010:701657; Pries R, andWollenberg B, 2006, Cytokine Growth Factor Rev, 17:141-6; Wulff S etal., 2009, Anticancer research, 29:3053-7; Kuss I et al. 2004. ClinCancer Res, 10:3755-62; Kuss I et al., 2005, Adv Otorhinolaryngol,62:161-72). Diminished NK cell and natural killer T (NKT) cell activityand number have been observed in the peripheral blood of patients withHNSCC (Wulff S et al., 2009, Anticancer research, 29:3053-7; Moiling J Wet al., 2007, J Clin Oncol, 25:862-8).

A novel DMR is identified herein that distinguishes NK cells from otherleukocytes to facilitate the quantification of NK cells in archivedblood samples from a case control study of HNSCC. Many chemicalexposures, such as tobacco and alcohol, as well as viral factors, suchas human papilloma virus (HPV), are known or suspected to be causalfactors in HNSCC (Furniss C S et al., 2009 Annals of oncology: officialjournal of the European Society for Medical Oncology/ESMO, 20:534-41;Applebaum K M et al., 2007, Journal of the National Cancer Institute,99:1801-10) and may independently affect immune profiles (Mehta H etal., 2008, Inflammation research, 57:497-503; Wansom D et al., 2010,Archives of otolaryngology—head & neck surgery 2010; 136:1267-73; Gao Bet al., 2011 American journal of physiology Gastrointestinal and liverphysiology 300:G516-25). Unlike previous studies, data shown hereinevaluates the effects of these factors on the depression in NK immuneprofile. Patient risk factors and disease characteristics (e.g. tumorlocation) are evaluated herein in relationship to NK cells to determinethe independent associations of HNSCC with innate immune parameters.

NK cell-specific DNA methylation was identified by analyzing DNAmethylation and mRNA array data from purified blood leukocyte subtypes(NK, T, B, monocytes, granulocytes), and confirmed via pyrosequencingand methylation specific quantitative PCR (MS-qPCR). NK cell levels inarchived whole blood DNA from 122 HNSCC patients and 122 controls from astudy population were assessed by MS-qPCR. Details of this studypopulation have been previously described (Applebaum K M et al., 2007,Journal of the National Cancer Institute, 99:1801-10). Briefly,peripheral blood from 122 control donors and 122 HNSCC patients wascollected between December 1999 and December 2003 in the greater Bostonarea. Population based control subjects with no prior history of cancerwere from the same region as cases, and were frequency matched on ageand gender. Study approval was obtained from the Brown UniversityInstitutional Review Board. Subjects provided written informed consentfor participation in this study. Venous anticoagulated whole blood wasdrawn into sodium citrate and stored at −20° C. prior to DNA isolation.

Pyrosequencing and MS-qPCR (FIG. 39) confirmed that a demethylated DNAregion in NKp46 distinguishes NK cells from other leukocytes, and servesas a quantitative NK cell marker. Demethylation of NKp46 wassignificantly lower in HNSCC patient blood samples compared withcontrols (p<0.001). Individuals in the lowest NK tertile had over 5-foldrisk of being a HNSCC case, controlling for age, gender, HPV16 status,cigarette smoking, alcohol consumption, and BMI (OR=5.6, 95% CI: 2.0,17.4) (FIG. 37). Cases did not show differences in NKp46 demethylationbased on disease treatment or tumor site.

The results of this study indicate a significant depression in NK cellsin HNSCC patients that is unrelated to exposures associated with thedisease. DNA methylation biomarkers of NK cells represent an alternativeto conventional flow cytometry that can be applied in a wide variety ofclinical and epidemiologic settings including archival blood specimens.

Understanding of immune cell level alterations associated with cancerand other diseases has, until now, been restricted by the limitations ofimmunodiagnostic methods. Described herein is a new method for measuringNK cell levels in human blood and tissue based on cell-lineage specificDNA methylation that can be applied to samples regardless of handlingand storage procedures. This is a step forward in immune cell detectionand quantification that is applicable to many types of clinical samples.Applying the method to a case-control study of HNSCC (Examples 27-36)revealed a case-associated decrease in circulating NK cells that isindependent of known risk factors and treatments. This shows that it isimportant to monitor NK cell levels in patients with HNSCC, and that itmay be worthwhile to pursue future immune therapies may be designedaimed at restoring circulating NK cells in patients with HNSCC.

A variety of methods are available as bases for methodology used toanalyze CpG methylation states. These methods can be divided roughlyinto two types: gene-specific and global methylation analysis. A largenumber of techniques have been developed for gene-specific CpGmethylation analysis. Early studies used methylation sensitiverestriction enzymes to digest DNA followed by Southern detection or PCRamplification. Bisulfite reaction based methods such as methylationspecific PCR (MSP) and bisulfite genomic sequencing PCR are commonlyused currently. Global methylation analysis measures the overall levelof methyl cytosines in genome by methods such as chromatography ormethyl accepting capacity assay. Further, methylation hot-spots ormethylated CpG islands in the genome may also be identified by severalof the recently developed genome-wide screen methods such as RestrictionLandmark Genomic Scanning for Methylation (RLGS-M), and CpG islandmicroarray.

The gene-specific method MethyLight is a highly sensitivehigh-throughput quantitative methylation assay, capable of detectingmethylated alleles in the presence of a 10000-fold excess ofunmethylated alleles using fluorescence-based real-time PCR technologythat requires few or minor further manipulations after the PCR step.Eads C A et al., Nucl. Acids Res. (2000) 28 (8): e32-00. For example, aMethylLight assay is commercially available from QIAGEN, Inc. Valencia,Calif.

In another embodiment of the method, analyzing the methylation of anygene, e.g., the CD3Z gene through amplification by Polymerase ChainReaction (PCR) is performed using digital PCR. Digital PCR is animproved method of PCR useful to overcome difficulties associated withconventional PCR. Conventional PCR assumes that amplification of nucleicacid is exponential and nucleic acids are quantified by comparing thenumber of amplification cycles and amount of PCR end-product to those ofa reference sample. In practice however, several factors interfere withthis calculation, making measurements uncertainties and inaccurate andhence unsuitable for highly sensitive measurements.

In digital PCR, a sample is partitioned so that individual nucleic acidmolecules within the sample are localized and concentrated within manyseparate regions. Molecules can be counted by estimating by using aPoisson distribution. Each partition contains “0” or “1” molecules, or anegative or positive reaction, respectively. After PCR amplification,nucleic acids are quantified by counting the regions that contain PCRend-product, which is a count of positive reactions. A system fordigital PCR based on integrated fluidic circuits (chips) havingintegrated chambers and valves for partitioning samples is commerciallyavailable. For example a digital PCR system is available from LifeTechnologies (Grand Island, N.Y. 14072USA) and QuantaLife QuantaLifePleasanton, Calif. USA).

This application relates to international application PCT/US2012/039669filed May 25, 2012 (published as international publication numberWO/2012/162660 published Nov. 29, 2012), which claims the benefit ofprovisional applications having Ser. Nos. 61/489,883 filed May 25, 2011entitled, “Methods of Immunodiagnostics using DNA Methylation arrays assurrogate measures of the identity of a cell or a mixture of cells”;61/509,644, filed Jul. 20, 2011 entitled “Methods of Immunodiagnosticsusing DNA Methylation arrays as surrogate measures of the identity of acell or a mixture of cells for prognosis and diagnosis of diseases”;61/585,892 filed Jan. 12, 2012 entitled, “Methods of Immunodiagnosticsusing DNA Methylation arrays as surrogate measures of the identity of acell or a mixture of cells for prognosis and diagnosis of diseases”; and61/619,663, filed Apr. 3, 2012 entitled “Methods using DNA Methylationarrays for identifying a cell or a mixture of cells for prognosis anddiagnosis of diseases, and for cell remediation therapies” inventorsKarl Kelsey, Eugene Andres Houseman, John Wiencke, William P. Accomando,Jr. and Carmen Marsit, each of which applications including the sequencelistings is hereby incorporated herein by reference in its entirety. Aportion of the examples and figures herein have been submitted as anappendix to provisional application Ser. No. 61/865,479 filed Aug. 13,2013, entitled, “Methods using DNA methylation for identifying a cell ora mixture of cells for prognosis and diagnosis of diseases, and for cellremediation therapies”, and is an unpublished manuscript submitted tothe journal Genome Biology entitled, “Quantitative reconstruction ofleukocyte subsets using DNA methylation” by William P. Accomando, Jr.,John Wiencke, Eugene Andres Houseman, Heather II. Nelson, and KarlKelsey.

The invention having been fully described is further illustrated by thefollowing claims and examples herein. Data in Example herein show thatcell mixture distributions within peripheral blood were assessedaccurately and reliably using DNA methylation. DNA methylation wasmeasured and analyzed in leukocyte subsets purified from whole blood,and a library of lineage specific DNA methylation signatures thatdistinguish human T-cells, B-cells, NK cells, monocytes, eosinophils,basophils and neutrophils were included that list these signatures. Thelibrary was used as a reference to quantify simultaneously these celltypes in DNA from adult human blood. The methods described weresuccessful in detecting clinically relevant shifts in leukocytepopulations. The methods, compositions and kits herein more accuratelyanalyzed human whole blood samples compared to established methods ofimmune cell quantification. Data obtained by these methods using DNAmethylation were found to be unaffected by duration of storage of blood.Data show that it was possible, using only DNA rather than whole cellsby the methods herein, to reconstruct precise immune cell differentialnumbers. Methods in various embodiments used a library includingsignatures comprising differentially methylated regions (DMRs) fromtypes of leukocytes in a blood sample of the patient. In variousembodiments, the library includes at least one gene or locus selectedfrom the group consisting of: FGD2, HLA-DOB, BLK, IGSF6, CLDN15, SFT2D3,ZNF22, CEL, HDC, GSG1, FCN1, OSBPL5, LDB2, NCR1, EPS8L3, CD3D, PPP6C,CD3G, TXK, and FAIM. In various embodiments, the library includes atleast one selected from the group consisting of: CLEC9A (2 loci),INPP5D, INHBE, UNQ473, SLC7A11, ZNF22, XYLB, HDC, RGR, SLCO2B1, C1orf54,TM4SF19, IGSF6, KRTHA6, CCL21, SLC11A1, FGD2, TCL1A, MGMT, CD19, LILRB4,VPREB3, FLJ10379. HLA-DOB, EPS8L3, SHANK1, CD3D (2 loci), CHRNA3, CD3G(2 loci), RARA, and GRASP. The nucleotide sequence and correspondingamino acid sequence of each of the genes or loci are listed in genome orprotein databases such as GenBank, European Nucleotide Archive, EuropeanBioinformatics Institute, GenomeNet, or The National Center forBiotechnology Information (NCBI) Protein database.

Examples herein accurately assed cell mixture distributions withinperipheral blood using DNA methylation. DNA methylation was measured inleukocyte subsets purified from wholeblood and was used to establish alibrary of lineage specific DNA methylation signatures thatdistinguished human T-cells, B-cells, NK cells, monocytes, eosinophils,basophils, and neutrophils. This library was used as a reference tosimultaneously quantify these cell types in DNA from adult human blood.Methods, compositions and kits described herein more effectivelydetected clinically relevant shifts in leukocyte populations thatestablished methods of immune cell quantification performed on humanwhole blood samples. Unlike established methods, methods describedherein were not affected by type and duration of storage of bloodsamples. Data show that precise immune cell differential estimates werereconstructed using only DNA rather than whole cells.

Different human cell types, defined by function and morphology, areshown in Examples herein in complex mixtures using a variety ofphysical, optical and proteomic characteristics. (Pollard, T. D. et al.2007 Cell Biology second edition Saunders Elsevier publishing,Philadelphia, Pa.).

Lineage-specific DNA methylation has been investigated to distinguishdifferent types of cells (Baron, U. et al. 2006 Epigenetics 1: 55-60;Wieczorek, G. et al. 2009 Cancer Res 69: 599-608; Sehouli, J. et al.2011 Epigenetics 6: 236-246; Wiencke, J. K. et al. 2012 Epigenetics 7:1391-1402; Accomando, W. P. et al. 2012 Clin Cancer Res 18: 6147-6154;Christensen, B. C. et al. 2009 PLoS Genet. 5, e1000602,doi:10.1371/journal.pgen.1000602). Patterns of DNA methylation,occurring at cytosine residues in the context of cytosine-guanine (CpG)dinucleotides, are tightly associated with chromatin conformation, whichcoordinates gene expression and reflects transcriptional programming ofgene expression. (Bird, A. 2002 Genes & development 16: 6-21; and Zaidi,S. K. et al. 2011 The Journal of biological chemistry 286: 18355-18361).During differentiation, somatic cell lineages undergo de novo DNAmethylation followed by maintenance methylation (Jaenisch, R. 1997Trends in genetics: TIG 13: 323-329), thereby establishing mitoticallyheritable, cell lineagespecific methylation signatures (Khavari, D. A.,et al. 2010 Cell Cycle 9, 3880-3883; Bocker, M. T. et al. 2011 Blood117, e182-189; Meissner, A. 2010 Nature biotechnology 28, 1079-1088;Hawkins, R. D. et al. 2010 Cell Stem Cell 6: 479-491). Patterns of DNAmethylation served as reliable indicators of cell lineage and were usedas sensitive and specific biomarkers for diverse cell types (Baron, U.et al. 2006 Epigenetics 1: 55-60; Accomando, W. P. et al. 2012 ClinCancer Res 18: 6147-6154; Meissner, A. 2010 Nature biotechnology 28,1079-1088; Davies, M. N. et al. 2012 Genome Biol 13: R43,doi:10.1186/gb-2012-13-6-r43; and Varley, K. E. et al. 2013 Genome Res23: 555-567).

The immune system is a powerful model for investigating, developing andimplementing new approaches to human cell detection and quantification.Blood is a complex mixture of many different specialized cell types andthe composition of white blood cell (WBC, or leukocyte) populationsreflects disease states and toxicant exposures (Bui, J. D. et al. 2007Curr Opin Immunol 19: 203-208; Kim, R. et al. 2007 Immunology 121: 1-14;Ostrand-Rosenberg, S. 2008 Curr Opin Genet Dev 18: 11-18; Dunn, G. P. etal. 2002 Nat Immunol 3: 991-998; Shimizu, J. et al. 1999 J Immunol 163,5211-5218; Zou, W. 2006 Nat Rev Immunol 6: 295-307). Thus, the abilityto detect an improper balance of immune cells is valuable both in aclinical and research setting. However, research aimed at furtherunderstanding immune cell level alterations is restricted by thelimitations of immunodiagnostic methods. Routine blood leukocytedifferentiation is achieved using physical cell isolation and theelectrical impedance or optical light scattering properties of the cells(Handin, R. I., Lux, S. E. & Stossel, T. P. 2003 Blood: Principles andPractice of Hematology second edition, 2304, Lippincott Williams &Wilkins). Fluorescently labeled antibodies and flow cytometry are usedto identify specialized cell subtypes, e.g. CD4+ T-cells (Sehouli, J. etal. 2011 Epigenetics 6: 236-246; Dieye, T. N. et al. 2011 Journal ofimmunological methods 372: 7-13). These methods rely upon intact cells,and therefore require fresh samples and cannot be applied to older,archived blood samples.

Human leukocytes derive from pluripotent hematopoietic stern cellsthrough a developmental process called hematopoiesis, resulting in ahierarchy of leukocyte lineages each with unique functions and geneexpression patterns (Parham, P. 2005 The Immune System second edition,Garland Science, New York, N.Y.). Epigenetic regulation of geneexpression is important to hematopoiesis; cellular fates are largelydetermined by patterns of DNA packaging into chromatin (Janson, P. C. etal. 2009 Biochim Biophys Acta 1790: 906-919).

Examples herein shown that human leukocyte lineages were distinguishedwith very high sensitivity and specificity by epigenetic marks such aspatterns of DNA methylation occurring in differentially methylatedregions, DMRs. The identification of DMRs that are biomarkers ofspecific human leukocyte lineages resulted in the development ofsensitive assays for monitoring these leukocytes in the peripheral bloodby measuring DNA methylation. While some immune cell lineage-specificDMRs have been used in assays to detect and quantify a single type ofleukocyte in human blood and tissue (Wieczorek, G. et al. 2009 CancerRes 69: 599-608; Sehouli, J. et al. 2011 Epigenetics 6: 236-246;Wiencke, J. K. et al. 2012 Epigenetics 7: 1391-1402; Accomando, W. P. etal. 2012 Clin Cancer Res 18: 6147-6154). Examples herein elucidate adifferent approach to simultaneously quantify the entire distribution ofWBC types in human blood using methylation profiles assessed in archivedDNA.

The compositions, methods and kits herein are useful for assessingimmune modulations including gimmune profiling to be performed in a widevariety of archival blood samples from large epidemiological studies ofhuman disease and exposure and clinical trials of drug efficacy andbiomonitoring. Examples herein include a novel platform for expansion ofthe nascent field of human immunotoxicology. Compositions, methods andkits herein provide an effective improvement in a vast number of noveldiagnostic and therapeutic procedures, by serving as a reliablealternative to the accepted reference standard of manual differential aswell as the automated differential and even FACS based analysis. Thus,compositions, methods and kits herein are useful in clinicalapplications as well as population studies; aiding in diagnosticfollow-up, toxicologic assessment and in numerous new approaches beingdeveloped in translational medical research. Furthermore, Examplesherein provide new approaches to clinicalprofiling of immune response totherapy for chronic diseases.

Without being limited by any particular theory or mechanism of action,it is envisioned that the compositions, methods and kits herein providecan be used to identify, characterize and enumerate any type of lineagestable human cells within complex mixtures. This presents anunprecedented opportunity for the development of a new generation ofmethods for cellular quantification that exploits the human methylome;supporting the feasibility of “molecular” histology. Using the immunesystem as a model, Examples herein created a paradigm for the mapping ofcell-specific DNA methylation signatures in order to generate referencelibraries of efficacious biomarkers that distinguish different celltypes. During mitosis, patterns of DNA methylation are replicated at thetime of DNA synthesis such that daughter cells inherit both geneticmaterial and epigenetic information contained within the parental cell(Khavari, D. A. et al. 2010 Cell Cycle 9, 3880-3883).

Examples herein include established powerful computational tools toquantitatively reconstruct the precise makeup of cellular mixtures. Inthe past, simultaneous quantification of normal or disease-associatedchanges in cell population composition has been accomplished using flowcytometry, electrical impedance, light scatter and/orimmunohistochemistry. This approach required large volumes of freshblood or tissue, and, for flow cytometry, can involve laborious antibodytagging (Roussel, M., et al. 2010 Cytometry. Part A: the journal of theInternational Society for Analytical Cytology 77: 552-563; Mittag, A. etal. 2011 Methods in cell biology 103: 1-20). In contrast, Examplesherein use high-throughput techniques which entail simple, convenientDNA analysis methods that can easily be automated to facilitate rapidquantitative reconstruction of cell subsets. Moreover, the assays andarrays (e.g., LDMA) employed use different chemistry than the HDMA,highlighting the crossplatform applicability of the approach describedherein.

Further examples of the inventions are found in a manuscript (48 pages)submitted to the journal Genome Biology entitled, “Quantitativereconstruction of leukocyte subsets using DNA methylation” by William P.Accomando, Jr., John K. Wiencke, E. Andres Houseman, Heather H. Nelson,and Karl T. Kelsey, which is incorporated by reference herein in itsentirety.

A skilled person will recognize that many suitable variations of themethods may be substituted for or used in addition to those describedabove and in the claims. It should be understood that the implementationof other variations and modifications of the embodiment of the inventionand its various aspects will be apparent to one skilled in the art, andthat the invention is not limited by the specific embodiments describedherein and in the claims. The present application mentions variouspatents, scientific articles, and other publications, each of which ishereby incorporated herein in its entirety by reference.

The invention having now been fully described, it is exemplified by thefollowing examples and claims which are for illustrative purposes onlyand are not meant to be further limiting.

EXAMPLES Example 1 Statistical Methods for Using DNA Methylation Arraysas Surrogate Measures of Cell Mixture Distribution

In the framework for measurement of methylation status of CpG sites incell mixtures Y_(0h) represents an m×1 vector of methylation assayvalues, e.g. average beta values from an Infinium bead-array productcorresponding to a purified blood sample consisting of a homogenouscellular population (e.g. monocytes or granulocytes), with thequalitative characterization of the cell type indicated by a d₀×1covariate vector w_(h). Here, hε{1, . . . , n₀}, and the m individualvalues correspond to CpG sites on a DNA methylation microarray, possiblypre-selected to correspond to putative DMRs for distinguishing differentcellular types. Correspondingly, Y_(1i) represents an m×1 vector ofmethylation assay values for the same CpG sites (in the same order) asY_(0h) but corresponding to a heterogeneous mixture of cells (e.g.peripheral whole blood) from a human subject. Here, iε{1, . . . , n₁},n₁ is the number of target specimens, and z_(1i) is a d₁×1 covariatevector representing an intercept as well as phenotypes or exposurescorresponding to the subject, e.g. d₁=2 for a simple case/control studywithout confounders. Here the goal is to understand the associationsbetween Y_(1i) and Z_(1i) in terms of associations between Y_(0h) andw_(0h), i.e. to infer changes in mixtures of cell types associated withphenotypes or exposures, using DNA methylation as a surrogate measure ofcell mixture. Thus, there are two data sets, S₀={(Y₀₁,w₁), . . . ,(Y_(0n) ₀ ,w_(n) ₀ )}, the set of data from “purified” cell sampleseffectively representing external validation or gold-standard data andS₁={(Y₁₁,z₁), . . . , (Y_(1n) ₁ ,z_(n) ₁ )}, representing surrogate datacollected from a target population. To this end following linear modelsare provided:

Y _(0h) =B ₀ w _(0h) +e _(0h)

Y _(1i)=μ₁ +B ₁ z _(1i) +e _(1i,)  (1)

where B₀ and B₁ are, respectively, m×d₀ and, m×d₁ matrices and e₀ and e₁are error vectors. For simplicity a one-way ANOVA parameterization for wis assumed. Slight generalizations to account for design complicationsmet in practice is described in Example 2.

A reasonable regression parameterization for z is also assumed,including an intercept, and for convenience, the first column of B₀ isdenoted as μ₁, the m×1 intercept. The error vectors e₀ and e₁ mayreflect independence among arrays h and i, or else may have more complexrandom effects structure accounting for technical effects or biologicalreplication; however, their substructures are incidental to thisanalysis, with the exception of the fine details of the bootstrapprocedure proposed below.

To implement a surrogacy relation, the following linking regressionmodel is proposed:

B ₁=1_(m)γ₀ ^(T) +B ₀ Γ+U,  (2)

where Γ is a d₀×d₁ matrix that summarizes associations between the rowsof B_(0j) and B_(1i) and U is a matrix of errors. Substituting equation(2) into (1), writing B₀=(b₀₁, . . . , b_(0d) ₀ ) explicitly in terms ofits columns and writing Γ^(T)=(γ₁, . . . , γ_(d) ₀ ), it follows that

$\begin{matrix}{Y_{1i} = {{\sum\limits_{l = 0}^{d_{0}}{b_{0l}\left( {\gamma_{l}^{T}z_{1i}} \right)}} + {\left( {{1_{m}\gamma_{0}^{T}} + U} \right)z_{1i}} + {e_{1\; i}.}}} & (3)\end{matrix}$

To impart a biological interpretation, it is assumed assume that the DNAassayed in S₁ arises as a mixture of DNA from cell types profiled in S₀,with mixture coefficients whose population average, conditional on z,are {ω₁ ^((z)), . . . ω_(d) ₀ ^((z))}, so that

$\begin{matrix}{{{E\left( {\left. Y_{1i} \middle| z_{1i} \right. = z} \right)} = {\xi^{(z)} + {\sum\limits_{l = 1}^{d_{0}}{b_{0\; l}\omega_{l}^{(z)}}}}},} & (4)\end{matrix}$

where the m×1 vector ξ^((z)) represents cell types excluded fromconsideration among the purified samples in S₀, or else non-cellspecific methylation, including alterations at the molecular level inthe maintenance of DNA methylation patterns themselves (possiblyexposure related, age, or disease related). It follows from (3) and (4)that the mixture coefficients are recoverable from Γ, ω_(l) ^((z))=γ_(l)^(T)z_(1i), provided ξ^((z)) is orthogonal to the column space of B₀. Asdiscussed in detail in the Example 3 bias can arise if differences inξ^((z)) between distinct values of z have nonzero projection onto thecolumn space of B₀, although the magnitude of anticipated biases can beassessed through sensitivity analysis as shown in Example 11.

It is possible to assign interpretations to the components of variationin (3). SS_(o) represents overall variability in Y_(1i), i.e.SS_(o)=Σ_(i=1) ^(n) ¹ ∥Y_(1i)−μ₁∥², where μ₁=E(Y_(1i)). Frommultivariate probability theory it is straightforward to show thatSS_(o)=SS_(e)+SS_(v)+SS_(u), where SS_(e)=Σ_(i=1) ^(n) ¹ ∥e_(1i)∥²,SS_(v)=Σ_(i=1) ^(n) ¹ (z_(1i)− z ₁)^(T)Γ^(T)B₀ ^(T)B₀Γ(z_(1i)− z ₁), andSS_(u)=Σ_(i=1) ^(n) ¹ {(z_(1i)− z ₁)^(T)U^(T)U(z_(1i)− z ₁)+m(z_(1i)− z₁)^(T)γ₀γ₀ ^(T)(z_(1i)− z ₁)}. SS_(e) measures variation unexplained bythe covariates z_(1i), presumed to represent a combination of technicalnoise and unsystematic biological heterogeneity. SS_(v) measuresvariability explained by mixtures of profiles in the set S₀, and SS_(u)measures variability in systematic biological heterogeneity thatnevertheless remains unexplained by mixtures of profiles in S₀,presumably due to some process other than differences in mixtures ofcell types. Thus two partial coefficient of determination measures areproposed: R_(1,0) ²=SS_(v)/SS_(o), which represents the proportion oftotal variation in S₁ explained by S₀, and R_(1,1)²=SS_(v)/(SS_(o)−SS_(e)), which represents the proportion of systematicvariation in S₁ explained by S₀. It is noted that R_(1,1) ² is poorlydefined when SS_(o)≈SS_(e).

Estimation proceeds by applying an appropriate linear model, e.g.ordinary least squares, linear mixed effects models (Wang and Petronis,2008, DNA Methylation Microarrays: Experimental Design and StatisticalAnalysis. Chapman & Hall, Boca Raton, Fla.), limma (Smyth, 2004, StatAppl Genet and Mol Biol, 3(1), 3), or surrogate variable analysis(Teschendorff et al., 2011, Bioinformatics, 27(11), 1496-505), to obtainestimates {circumflex over (B)}₀ and {circumflex over (B)}₁. Estimatesof γ₀ and Γ are then obtained by projecting {circumflex over (B)}₁ ontothe column space of {tilde over (B)}₀=(1_(m),B₀), as described in detailin the Example 3. Standard errors can be obtained in one of three ways.The simplest estimator, SE₀, is the “naive” estimator from simple leastsquares theory, ignoring the fact that {circumflex over (B)}₀ and{circumflex over (B)}₁ are estimates, i.e. potentially variable. Toaccount for variation in estimating {circumflex over (B)}₁, a simplealternative is to use a nonparametric bootstrap procedure.

For each bootstrap iteration t, sampling is performed with replacementfrom S₁ (or sample errors in a manner consistent with a hierarchicalexperimental design) to obtain S₁ ^((t)), producing bootstrap estimates{circumflex over (B)}₁ ^((t)) from which “single-bootstrap” standarderrors SE₁ are computed. Finally, it is possible to account forvariation in estimating B₀ by also bootstrapping S₀; because ofpotentially small sample sizes n₀, using a parametric bootstrap isproposed herein. A “double-bootstrap” standard error estimator, SE₂, iscomputed from these two sets of bootstraps. The double-bootstrap has theadditional benefit over the single-bootstrap, in that it can be used toassess bias due to measurement error (variability) in {circumflex over(B)}₀. Estimation details are provided in Example 3.

Beyond bias due to measurement error, which is easily corrected usingthe double-bootstrap procedure, there are additional sources ofpotential bias. For example, a univariate z representing case/controlstatus is considered, where δ≡ξ⁽¹⁾−ξ⁽⁰⁾=B₀α for some d₀×1 vector α≠0. Insuch a situation, there will be a bias equal to α in estimating themixture differences. Example 2 provides a detailed analysis of suchbiases, and proposes a sensitivity analysis procedure for assessing themagnitude of possible bias in a given data set.

In the examples herein the method for inferring changes in thedistribution of white blood cells between different subpopulations isused for analysis of population data. It is possible to use S₀ topredict distribution of leukocytes in a single sample having DNAmethylation profile Y*. Equating the intercept term of B₁ in (1) with Y*and applying (2), mixing proportion estimates Γ*=({tilde over (B)}₀^(T){tilde over (B)}₀)⁻¹{tilde over (B)}₀ ^(T)Y* is obtained. Estimatescan be further refined with the use of quadratic programming techniques(Goldfarb and Idnani, 1983, Math Prog, 27, 1-33), restricting thecomponents of Γ*, γ_(l)*≧0 in minimizing ∥Y*−B₀Γ*∥² with respect to Γ*.Such individual projections of methylation profiles on the column spacespanned by S₀ facilitate the application of the fundamental ideasproposed above to individual, clinically-based diagnostic procedures.

It is noted that DNA methylation arrays are typically focused on thecomparison of methylated to unmethylated CpG dinucleotides, notquantifying actual amounts of DNA. Therefore, information on cellmixtures from DNA methylation is limited to distributions, not actualcounts, as one might obtain from flow cytometry. In addition, it ispossible to model z_(1i) directly as a function of mixture coefficientsΓ* obtained individually via the constraint γ_(l)*≧0.

Example 2 General Designs for the Treatment of Methylation Assay DataObtained from Purified Cells S₀

Because the cell types assembled in S₀ potentially involve hierarchicalrelationships corresponding to cell lineage, designs that are moregeneral than a one-way ANOVA parameterization may be necessary for w. Ifcell-type interpretations can be extracted from S₀ via a d₀×d₀* contrastmatrix L (i.e. B₀L identifies the mean methylation for d*₀ cell types),then interpretations can be obtained by simply replacing {circumflexover (B)}₀ with {circumflex over (B)}L in the projection used toestimate γ₀ and Γ and their standard errors. The case of CD4+ and CD8+ Tcells, both of which are the primary components of the T-lymphocytegroup is considered as an example. In this example one sample ispurified CD4+ T cells, another sample is purified CD8+ T cells, and yetanother sample is T-lymphocyte cells that have not been purified to morespecific lineages. Such was the case for S₀ in the examples. TheCD4+sample may be identified as w_(0h)=(1,1,0)^(T), the CD8+ sample asw_(0h)=(1,0,1)^(T), and the latter, less specific sample asw_(0h)=(1,0,0)^(T). Then an appropriate contrast L for identifying CD4+and CD8+ samples would be constructed as a 3×2 matrix with columns(1,1,0)^(T) and (1,0,1)^(T). This approach was used in the examples 6-9below, and was also employed in the simulations.

Example 3 Estimation Details and Bias

Estimation:

A two-stage estimation procedure is here introduced. The first stage ofanalysis involves estimation of B₀ and B₁ by appropriate linear models,e.g. ordinary least squares (OLS) regression estimator {circumflex over(B)}₀ ^(T)=[Σ_(h=1) ^(n) ⁰ z_(0h)z_(0h) ^(T)]⁻¹[Σ_(h=1) ^(n) ⁰ z_(0h)^(T)Y_(0h) ^(T)] and a similar estimator for ({circumflex over (μ)}₁,{circumflex over (B)}₁)^(T); a procedure such as limma; or elselocus-by-locus linear mixed effects models that adjust for technical(e.g. chip) effects. The second stage of analysis, estimation of ^(□)γ₀and ^(□)Γ, proceeds as follows:

({circumflex over (γ)}₀,{circumflex over (Γ)}^(T))={tilde over (B)} ₁^(T) {tilde over (B)} ₀({tilde over (B)} ₀ ^(T) {tilde over (B)}₀)⁻¹,  (5)

where {tilde over (B)}₀=(1_(m),{circumflex over (B)}₀). Let {circumflexover (r)}_(γ)={circumflex over (B)}₁−1_(m){circumflex over(γ)}₀−{circumflex over (B)}₀{circumflex over (Γ)}, {circumflex over(Σ)}≡({circumflex over (σ)}_(rs) ^((γ)))_(rs)=(m−d₀−1)⁻¹{circumflex over(r)}_(γ) ^(T){circumflex over (r)}_(γ), V₀=m({tilde over (B)}₀^(T){tilde over (B)}₀)⁻¹, and V₀=(v_(rs) ⁽⁰⁾)_(rs). Naive standard errorestimates for the (r,s)^(th) element of ({circumflex over(γ)}₀,{circumflex over (Γ)}^(T)) can be obtained by computing (m⁻¹v_(ss)⁽⁰⁾{circumflex over (σ)}_(rr) ^((γ)))^(1/2). The naive standard errorestimates fail to account for the variability in estimating {circumflexover (B)}₀ and {circumflex over (B)}₁, and are consequently biased, asdemonstrated in the simulations, Example 12.

A nonparametric bootstrap procedure is used as an alternative. For eachbootstrap iteration t, with replacement from S₁ is sampled, (or sampleerrors in a manner consistent with a hierarchical experimental design,e.g. taking into account chip effects), to obtain S₁ ^((t)). From S₁^((t)) an estimate of {circumflex over (B)}₁ ^((t)) is obtained, andthen {circumflex over (γ)}₀ ^((t)) and {circumflex over (Γ)}^((t)) arecomputed by replacing {circumflex over (B)}₁ with {circumflex over (B)}₁^((t)) in (S1). After resampling a large number T times, standard errorsare obtained empirically from the bootstrap sets {{circumflex over (γ)}₀^((t))}_(t=1, . . . , T) and {{circumflex over(Γ)}^((t))}_(t=1, . . . , T). This method of estimation is called the“single bootstrap” to distinguish it from an alternative that accountsfor variability in estimation of {circumflex over (B)}₀ as well.

Because S₀ will typically consist of small sample sizes per cell type, anonparametric bootstrap procedure for estimating variation in{circumflex over (B)}₀ may not perform well. Therefore a parametricbootstrap is used. Let Ω_(j) be the variance-covariance matrix for thej^(th) row of {circumflex over (B)}₀. A resampled matrix {circumflexover (B)}₀ ^((t)) is formed by adding, to each row j of {circumflex over(B)}₀, a zero-mean multivariate normal vector with variance-covarianceΩ_(j), or a corresponding multivariate t-distribution with n₀−d₀ degreesof freedom. Then {circumflex over (γ)}₀ ^((t)) and {circumflex over(Γ)}^((t)) are computed from (S1) by replacing {circumflex over (B)}₀with {circumflex over (B)}₀ ^((t)) (in addition to the previouslymentioned replacement). This method is referred to as the “doublebootstrap”. The double bootstrap ignores correlation between CpG siteswithin a single validation sample, and given the relative purity assumedfor these samples and adequate correction for technical effects, this isreasonable to first order. As is demonstrated in Examples 6-9 andsimulations (Example 10), there is negligible difference between thesingle and double bootstrap, so the incorporation of additionalcomplexity to model cross-CpG correlations is unlikely to produce muchbenefit. However, the double-bootstrap has the additional benefit overthe single-bootstrap, in that it can be used to assess bias due tomeasurement error (variability) in {circumflex over (γ)}₀.

Bias:

There are several potential sources of bias in this analysis. The firstarises from measurement error in B₀, and the others arise frombiological non-orthogonality.

It can be shown that first form of bias, from measurement error,manifests as a multiple of Γ on the order of V₀ Ω, where Ω=m⁻¹Σ_(j=1)^(m)Ω_(j). However, it is easily assessed using the double-bootstrapprocedure described above, by subtracting {circumflex over (γ)}₀ fromT⁻¹Σ_(t=1) ^(T){circumflex over (γ)}₀ ^((t)) and {circumflex over (Γ)}from T⁻Σ_(t=1) ^(T){circumflex over (Γ)}^((t)), and bias correction canbe implemented by subtracting this term from the estimate.

Biases induced by biological non-orthogonality are more insidious. Forexample, a univariate z_(1i) is considered representing case/controlstatus, where δ=ξ⁽¹⁾−ξ⁽⁰⁾=B₀α for some d₀×1 vector α≠0. In such asituation, there will be a bias equal to α in estimating the mixturedifferences. Non-orthogonal δ may arise from two distinct sources. Oneoccurs when some cell types have not been profiled in S₀, so thatΣ_(l=0) ^(d) ⁰ ω_(l) ^((z))<1. The other may arise when somenon-cell-mediated biological process (i.e. distinct from a change incellular mixtures) nevertheless results in methylation profiles thatappear similar to those that distinguish cell types profiled in S₀. Tothis end, model represented by equation (4) is elaborated follows:

$\begin{matrix}{{{E\left( {\left. Y_{1i} \middle| z_{1i\; 1} \right. = z} \right)} = {{\sum\limits_{l = 1}^{d_{0}}{\left( {{B_{0}ɛ_{l}} + \lambda_{l}^{(z)}} \right)\omega_{l}^{(z)}}} + {\sum\limits_{q = 1}^{Q}{\left( {{\overset{\sim}{\mu}}_{q} + {\overset{\sim}{\lambda}}_{q}^{(z)}} \right){\overset{\sim}{\omega}}_{q}^{(z)}}}}},} & (6)\end{matrix}$

where qε{1, . . . , Q} indexes unprofiled cell types (or free DNA), eachwith methylation profile {circumflex over (μ)}_(q), and in mixtureproportions ω_(l) ^((z)) and {tilde over (ω)}_(q) ^((z)), Σ_(l=1) ^(d) ⁰ω_(l) ^((z))+Σ_(q=1) ^(Q){tilde over (ω)}_(q) ^((z))=1. Here λ^((z))denotes an “abnormal”, or at least non-functional, non-cell-mediatedprocess that is specific to disease status (and may affect differentcell types in different degrees of intensity).

Let P=({tilde over (B)}₀ ^(T){tilde over (B)}₀)⁻¹{tilde over (B)}₀ ^(T),and denote difference between case and control parameters using Δ, e.g.Δω_(l)=ω_(l) ⁽¹⁾−ω_(l) ⁽⁰⁾ andΔE(Y_(1i))=E(Y_(1i)|z_(1iI)=1)−E(Y_(1i)|Z_(1iI)=0). It follows fromequation (6) that

$\begin{matrix}{{P\; \Delta \; {E\left( Y_{1i} \right)}} = {{\sum\limits_{l = 1}^{d_{0}}{ɛ_{l}\Delta \; \omega_{l}}} + {\sum\limits_{q = 1}^{Q}{P\; \mu_{q}\Delta \; {\overset{\sim}{\omega}}_{q}}} + {\sum\limits_{l = 1}^{d_{0}}{P\; {\Delta \left( {\lambda_{l}\omega_{l}} \right)}}} + {\overset{Q}{\sum\limits_{q = 1}}{P\; {{\Delta \left( {\lambda_{q}{\overset{\sim}{\omega}}_{q}} \right)}.}}}}} & (7)\end{matrix}$

The values Δ{tilde over (ω)}_(q) may need to shift in order toaccommodate any shifts in Δω_(l), since the model constrains Σ_(l=1)^(d) ⁰ Δω_(l)+Σ_(q=1) ^(Q)Δ{tilde over (ω)}_(q)=0. The first term on theright hand side of (6) is the target quantity, identifying the desiredmixture weights. The second term will be negligible if the profiles{tilde over (μ)}_(q) are approximately orthogonal to the columns of B₀,or else the differences Δ{tilde over (ω)}_(q) are small. This conditionwill be satisfied if S₀ is exhaustive in the sense that 1−Σ_(l=1) ^(d) ⁰ω_(l) ^((z)) is negligible.

Mathematically, it is difficult to further characterize the latter twoterms, without specifying what kinds of non-cell-mediated processes arelikely. For example, even if Δλ_(q)=0 for a particular value of q, itmay nevertheless still produce a bias if Δ{tilde over (ω)}_(q)≠0.Conversely, even if Δω_(l)=0, bias can result from a nonzero differenceΔλ_(l) (e.g. different methylation intensities at island shores due todistinct risk profiles) if Δλ_(l) is not annihilated by P. Onlyprocesses that are equal in intensity in both cases and controls andacross cell types will be differenced out of equation (7). Thus, a keyconsideration is whether P annihilates the methylation signaturecorresponding to a given non-cell-mediated biological process. In orderto examine this issue more carefully, a Bayesian view is adopted tocharacterize a prior expectation of bias as a function of priorprobabilities for individual CpG sites. The goal, in part, is tounderstand the potential for bias, given the number m of CpG siteschosen to be measured in S₀, with the goal of selecting m in a mannerconsistent with minimizing bias.

Assuming that the CpGs under consideration are ordered in advance (e.g.randomly or by F-statistic F_(j)=d₀ ⁻¹{circumflex over (B)}_(0j•)Ω_(j)⁻¹{circumflex over (B)}_(0j•) ^(T), and that the dependence oftrH_(m)={tilde over (B)}₀ ^(T){tilde over (B)}₀ is explicitly written onm. If the CpGs are randomly ordered, then trH_(m)=O(m), otherwise it ispossible that trH_(m)=O(m^(1-ζ)), ζ>0 reflecting a diminishing rate ofreturn by adding additional non-informative CpG sites. Then δ=Σ_(l=1)^(d) ⁰ PΔ(λ_(l)ω_(l))+Σ_(q=1) ^(Q)PΔ({tilde over (λ)}_(q){tilde over(ω)}) is decomposed by the number k of CpG sites affected by alterationsthat distinguish cases from controls. k is fixed, kεJ_(m)={1, . . . ,m}; each of the C(m,k)=m!/[k!(m−k)!] subsets J_(kl)⊂J_(m) of k indicescorresponds to a vector δ_(kl) representing the mean methylationdifference between case and control over systematic biological processesthat result in changes at the k specific CpG sites represented by the kindices, and only those k CpG sites. Thus δ_(kl) has at most k nonzerovalues. The bias resulting from such processes is H_(m) ⁻¹{tilde over(B)}₀ ^(T)δ_(kl)=O(km^(ζ-1)). A prior probability π_(kl) is assumed thatthe subset J_(kl) could correspond to one or more biological processesthat distinguish cases from controls. It follows from this view that theprior expectation of δ is

$\begin{matrix}{{E\left\lbrack \delta \middle| \left( \pi_{kl} \right)_{kl} \right\rbrack} = {{\overset{m}{\sum\limits_{k = 1}}{\overset{C{({m,k})}}{\sum\limits_{l = 1}}{\pi_{kl}\delta_{kl}}}} = {{O\left( {\overset{m}{\sum\limits_{k = 1}}{\overset{C{({m,k})}}{\sum\limits_{l = 1}}{\pi_{kl}k\; m^{\zeta - 1}}}} \right)}.}}} & (8)\end{matrix}$

If a prior probability over sets of CpG sites in the genome isconstructed so that CpG sites are considered independent, and each CpGsite is assigned a uniform prior probability of π₀, then π_(kl)≡π₀^(k)(1−π₀)^(m-k) and, from (8),

$\begin{matrix}\begin{matrix}{{E\left( \delta \middle| \pi_{0} \right)} = {O\left( {m^{\zeta}{\overset{m}{\sum\limits_{k = 1}}{{C\left( {{m - 1},{k - 1}} \right)}{\pi_{0}^{k}\left( {1 - \pi_{0}} \right)}^{m - k}}}} \right)}} \\{= {{\pi_{0}\left( {1 - \pi_{0}} \right)}{{O\left( m^{\zeta} \right)}.}}}\end{matrix} & (9)\end{matrix}$

The bias does not depend on m if trH_(m)=O(m), i.e. random ordering.Random ordering renders the size of E(δ|π₀) theoretically independent ofm, it does so at the cost of including many potentially noninformativeCpGs, early on at low values of m, and these may be possible sources ofbias in practice, without offering any modeling benefit in return. Ifthe CpG sites are ordered by level of informativeness, then potentiallyH_(m)=O(m^(1-ζ)), and there will be a small increasing prior expectationof bias, motivating judicious choice of m. The key, then, is to orderthe CpGs in terms of their ability to distinguish different typesprofiled in S₀, choosing m large enough to distinguish the signaturesfrom one another, but small enough that the E(δ|π₀) is reasonably low,in a relative sense. Naturally, different choices of prior π_(kl) in (8)will lead to different conclusions about the magnitude of bias. If theset J_(m) of CpG sites used in S₀ and S₁ oversample those known to haveless modifiable methylation states, e.g. away from so-called shoreregions (Doi A et al., 2009, Nat Genet. 41: 1350-3), then π₀ iseffectively lowered, and so will be the corresponding expected priorbias. It is worth emphasizing that this analysis concerns only aBayesian prior, not the actual biological truth. In choosing CpG sitesamong those assayed in S₀ and S₁, a potentially negative outcome wouldbe to have included a number of sites that also happen to representsystematic, non-cell-mediated biological differences between cases andcontrols in S₁, in which case biased estimates will be inevitable. Insummary, bias in the proposed estimation procedure is controlled byselecting a sufficiently exhaustive list of cell types to profile in S₀,and by choosing m judiciously.

Example 4 Proof of Concept of Measurement Error Model for DeterminingChanges in Distribution Of White Blood Cells Between DifferentSubpopulations

In this example, general features of the method herein are describedthat can be used with existing methylation data sets as benchmarks forvalidating the proposed method to demonstrate its clinical orepidemiological utility. Examples 6-9 that follow show application ofthe method to specific data sets. The data analyses involve DNAmethylation data obtained by the Infinium HumanMethylation27 BeadchipMicroarrays from Illumina, Inc. (San Diego, Calif.). A subset of m=100CpG sites on the array was used and the subset was selected as describedbelow. In Examples 6-9, S₀ consisted of 46 white blood cell samples; thesorted, normal, human, peripheral blood leukocyte subtypes werepurchased from AllCells®, LLC (Emeryville, CA) and were isolated fromwhole blood using a combination of negative and positive selection withhighly specific cell surface antibodies conjugated to magnetic beads;materials and protocols were obtained from Miltenyi Biotec, Inc.(Auburn, Calif.). These 46 samples are summarized in Table 2 anddepicted by the clustering heatmap in FIG. 1. T lymphocytes that expressCD4 or CD8 constitute over 95% of the T cell class. The pan-T cell typewas further refined to CD4+, CD8+, and “other” Pan-T cells subtypes.

In summary, the covariate vector w_(h) consisted of indicators for fivecell types and another two indicators for CD4+ and CD8+ T cell subtypes.A generalization of the one-way ANOVA parameterization assumed above forw_(h) (Example 2) was necessary to account for the ambiguous status ofsome Pan-T cells. For each CpG site, a linear mixed effects model with arandom intercept for bead chip was used to estimate B₀; 27 additionalwhole blood control samples (replicates from the same individual) wereused to assist in estimating chip effects, since otherwise the data setwould have been sufficiently sparse to risk confounding between celltype and chip. These “array controls” were indicated with an additionalterm in w_(0h). For each CpG site, a linear mixed effects model with arandom intercept for bead chip was used to estimate the correspondingrow of B₀ and B₁.

From S₀, F statistics were computed and used to order each of the 26,486autosomal CpGs by decreasing level of informativeness with respect toblood cell types. FIG. 5A depicts the relationship log₁₀ trH_(m) bylog₁₀ (m) for increasing array sizes. FIG. 5B depicts the relationship ∂log₁₀ tr(H_(m))/∂ log(m) by log₁₀(m) for increasing array sizes,obtained by smoothing the first differences of the curve depicted inFIG. 5A via loess smoother. FIG. 5A also shows the tangent (obtainedfrom the loess curve) at low values of m. For O(m) convergence, FIG. 5Ashould show a linear association with slope equal to one, and the curvein FIG. 5B should show a curve close to the value of 1.0. Neither is thecase, i.e. convergence is sub-linear in m. It is noted that the rate ofconvergence dropped precipitously after about 6,000 CpG sites, but wasnotably slower than 0(m) even after m=10. In the range of 1-1000 CpGsites the convergence rate appeared parabolic with a minimum of about0.85, starting to stabilize in the m=100-300 range. Thus, maximuminformativeness was provided by the highest ranking m=100-300 CpG sites,with m>300 reflecting diminishing returns from adding additional CpGs.Therefore, a moderately low value of m in this range, m=100, consistentwith the size of a small custom microarray chip was chosen.

TABLE 2 Sorted white blood cells in S₀ Short Name Description Number Bcells CD19+ B-lymphocytes 6 Granulocytes CD15+ granulocytes 8 MonocytesCD 14+ monocytes 5 NK CD56+ Natural Killer (NK) cells 11 T cells(CD4+)^(1,2) CD3+CD4+ T-lymphocytes 8 T cells (CD8+)^(1,3) CD3+CD8+T-lymphocytes 2 T cells (NKT)¹ CD3+CD56+ natural killer 1 T cells(other)¹ CD3+ T-lymphocytes 5 ¹Considered as a member of the “pan-Tcell” group. ²Pan-T cell further refined as also belonging to the “CD4+”group. ³Pan-T cell further refined as also belonging to the “CD8+”group.

Example 5 Cell Mixture Experiment for Validating the Method forDetermining Changes in Distribution of White Blood Cells BetweenDifferent Subpopulations

In this example is described a laboratory reconstruction experiment,which validates the concept on which the method herein is based that DNAmethylation retains substantial information about cell mixtures. Theresults of applying the method herein to several different target datasets S₁ is described in Examples 6-9.

For the HNSCC and ovarian cancer data sets, from which bead chip datawere available, a linear mixed effects model with a random intercept forbead chip was used to estimate the corresponding row of B1. For theremaining data sets, no bead chip data were available; consequently,ordinary least squares was used. 250 bootstrap iterations were used foreach example and each of the two bootstrap methods of standard errorestimation.

An experiment was conducted which involved six known mixtures ofmonocytes and B cells and six known mixtures of granulocytes and Tcells. FIG. 2 presents both the known fractions (“Expected”) and theresulting predictions (“Observed”) from Infinium 27K profiles, asdescribed above. As FIG. 2 shows, accuracy of prediction is within 10%,and often less than 5%, with the largest errors occurring forgranulocytes, as shown in Table 3. It is noted that the sum of theindividual observed predictions for each individual profile ranged from98.9% to 102.7% even though the constraints of the projection do notexplicitly constrain the sum to 100%; this provides additional evidencethat the DNA methylation profile captures information about cellmixtures.

TABLE 3 Summary statistics for errors in cell mixture reconstructionResults* B cell Granulocyte Monocyte NK T cell minimum 0.0 0.3 0.0 0.00.0 median 0.1 6.5 1.1 2.1 0.3 maximum 5.5 10.0 4.1 6.4 5.3 *|Observed %− Expected %|

Example 6 Application of the Methods Herein to the Subpopulations ofHead and Neck Cancer Patients and Controls

This example describes the application of the method herein fordetermining changes in the distribution of white blood cells betweendifferent subpopulations to patients having head and neck squamous cellcarcinoma (HNSCC). The target data set S₁ was obtained from arraysapplied to whole blood specimens collected in a random subset ofindividuals involved in an ongoing population-based case-control study(Peters et al., 2005, Cancer Epidemiol Biomarkers Prev, 14(2), 476-82)of head and neck cancer (HNSCC): 92 cases and 92 age and sex matchedcontrols. Blood was drawn at enrollment (prior to treatment in 85% ofthe cases). Mean age among the subjects arrayed in this study was 60years, and there were 56 females and 128 males, consistent with thehigher incidence of the disease in men. Thus, the covariate vector zconsisted of an indicator for case/control status, an indicator for malesex, and age (in decades) centered at the mean. The clustering heatmapin FIG. 3 depicts the raw DNA methylation data in S₁. Table 4 presentscoefficient case status, double-bootstrap bias estimates (estimates ofbias arising from measurement error), as well as naive,single-bootstrap, and double-bootstrap standard error estimates. Each ofthese quantities is measured in percentage points (%). Estimates of biasarising from measurement error (i.e. substituting estimated quantitiesfor known ones in a two-stage statistical procedure) were almost alwaysless than half a percentage point, and for significant coefficientestimates, always towards the null.

The proportion of CD4+ T-lymphocytes decreased in cases compared withcontrols, with a bias-corrected estimate of −10:4 percentage points andapproximate 95% confidence interval (−13:1%; −3:3%); the proportion ofNK cells decreased, with a bias-corrected estimate of −1.5 percentagepoints and 95% confidence interval (−2:2%; −0:75%); and the proportionof granulocytes increased, with a bias-corrected estimate of 7.6percentage points and 95% confidence interval (4:2%; 10:9%). There wasalso some evidence of an increase in CD8+ T-lymphocytes, with anestimate of 4.5 percentage points and 95% confidence interval (4:5%;7:0%). As shown in Table 5 the proportion of CD4+ T-lymphocytesdecreased by 3.3 percentage points (−4:4%; −2:2%) per decade of age, andCD8+ T-lymphocytes increased by 2.0 percentage point (1:0%; 3:0%) perdecade. The other coefficients were insignificant.

For this analysis, R_(1,0) ² was estimated at 14.2%, and R_(1.1) ² wasestimated at 93:9%. Thus, a small but non-negligible proportion of totalvariation (systematic variation+unexplained biologicalheterogeneity+technical noise) appeared to have been driven by changesin cell population between cases and controls and as a result of aging.The SS_(e) comprised 85% of total variation, so a substantial portion ofvariability in DNA methylation appeared to remain unexplained(presumably due, in large part, to technical noise). However, thesystematic variation was explained by changes in cell population.

These results were consistent with previous studies, as HNSCC patientsare known to display an absolute and relative increase in myeloidderived granulocytes (Trellakis et al., 2011, Int J Cancer, Epub aheadof print, DOI: 10.1002/ijc.25892) and also displayed an alteration inlymphoid T cell homeostasis that leads to decreases in CD4+ T cells(Kuss et al., 2004, Clin Cancer Res, 10(11), 3755-62; Kuss et al., 2005,Adv Otorhinolaryngol, 62, 161-72). In addition, the proportion of Tregcells (a subclass of CD4+ T cells) is known to decrease from infancy toadulthood (Mold et al., 2010, Science, 330(6011), 1695-9). The biasestimates obtained from the double-bootstrap procedure allow thecorrection of bias arising from measurement error. However, there is nostatistical procedure for correcting the other possible sources of bias,those arising from changes in distribution among unprofiled cell typesas well as non-immune-mediated methylation differences. Example 7presents a detailed sensitivity analysis which shows that the magnitudeof the resulting bias is likely to be small, less than a percentagepoint.

TABLE 4 Estimates for HNSCC analysis (case vs. control) P- Est Bias₂ SE₀SE₁ SE₂ value (Intercept, γ₀) −0.62 −0.02 0.41 0.52 0.52 0.23 B Cell−0.45 0.04 0.30 0.77 0.76 0.55 Granulocyte 7.51 −0.07 0.50 1.73 1.71<0.0001 Monocyte 0.49 0.10 0.50 0.47 0.48 0.31 NK −1.43 0.06 0.56 0.370.38 0.00017 T Cell (cd4+) −9.08 1.32 1.95 1.15 1.39 <0.0001 T Cell(cd8+) 3.06 −1.46 1.96 0.98 1.27 0.016 Est = Regression coefficientestimate (×100%). Bias₂ = Double-bootstrap bias estimate (×100%). SE₀ =Naive standard error (×100%) SE₁ = Single-bootstrap standard error(×100%). SE₂ = Double-bootstrap standard error (×100%). P-values werecomputed using SE₂.

TABLE 5 Estimated Regression Coefficients for Sex and Age in HNSCC DataSet P- Est Bias₂ SE₀ SE₁ SE₂ value Sex (Intercept, 0.12 0.00 0.24 0.570.57 0.83 γ₀) B Cell 0.38 0.01 0.17 0.85 0.84 0.65 Granulocyte −0.29−0.08 0.28 1.82 1.81 0.87 Monocyte 0.13 0.01 0.29 0.47 0.47 0.78 NK 0.490.05 0.32 0.40 0.40 0.22 T Cell −1.80 0.45 1.12 1.25 1.20 0.13 (cd4+) TCell 0.82 −0.44 1.12 1.03 1.04 0.43 (cd8+) (Age - (Intercept, −0.20−0.02 0.15 0.24 0.24 0.40 60)/10 γ₀) B Cell 0.24 0.01 0.11 0.34 0.330.47 Granulocyte 1.12 −0.01 0.19 0.67 0.67 0.096 Monocyte 0.13 0.02 0.190.20 0.20 0.54 NK −0.22 0.02 0.21 0.15 0.15 0.14 T Cell −2.75 0.56 0.730.53 0.57 < 0.0001 (cd4+) T Cell 1.44 −0.56 0.73 0.46 0.50 0.0038 (cd8+)Est = Regression coefficient estimate (×100%) Bias₂ = Double-bootstrapbias estimate (×100%). SE₀ = Naive standard error (×100%). SE₁ =Single-bootstrap standard error (×100%). SE₂ = Double-bootstrap standarderror (×100%). P-values were computed using SE₂.

Example 7 Application of the Methods Herein to Subpopulations of OvarianCancer Cases and Controls

In this example the method herein for inferring changes in thedistribution of white blood cells between different subpopulations (e.g.cases and controls) was applied to an ovarian cancer data set(Teschendorff et al., 2009, PLoS ONE, 4(12), e8274). DNA methylationdata for blood samples were obtained from Gene Expression Omnibus(Accession number GSE19711). Only those cases in which blood wascollected pre-treatment were used ere. After removing four arrays with apreponderance of missing values, the data set consisted of 272 controlsand 129 cases in which blood was collected prior to treatment. Aclustering heatmap displaying the DNA methylation data is shown in FIG.6. In this analysis, z consisted of case-control status, age(categorized in five-year increments), and two bisulfite conversionefficiency measures. Tables 6-8 presents result for case-control statusand estimated regression coefficients for age in ovarian cancer dataset. R_(1,0) ² was estimated at 17.8%, and R_(1,1) ² was estimated at86:1%.

TABLE 6 Estimates for Ovarian Cancer Analysis (Case vs. Control) P- EstBias₂ SE₀ SE₁ SE₂ value (Intercept, γ₀) −0.05 −0.05 0.41 0.19 0.20 0.81B Cell −1.36 0.02 0.29 0.22 0.23 <0.0001 Granulocyte 8.97 −0.04 0.491.02 1.00 <0.0001 Monocyte 0.55 0.06 0.49 0.29 0.30 0.066 NK −2.09 0.010.55 0.31 0.34 <0.0001 T Cell (cd4+) 5.64 0.18 1.93 1.06 1.34 <0.0001 TCell (cd8+) −0.35 −0.17 1.93 0.95 1.19 0.77 Est = Regression coefficientestimate (×100%). Bias₂ = Double-bootstrap bias estimate (×100%). SE₀ =Naive standard error (×100%) SE₁ = Single-bootstrap standard error(×100%). SE₂ = Double-bootstrap standard error (×100%). P-values werecomputed using SE2.

TABLE 7 Estimated Regression Coefficients for Age in Ovarian Cancer DataSet P- Est Bias₂ SE₀ SE₁ SE₂ value Age (Intercept, γ₀) −1.24 −0.05 0.370.41 0.40 0.0021 55-60 B Cell 0.40 0.04 0.27 0.50 0.49 0.42 Granulocyte0.91 0.04 0.45 2.04 2.02 0.65 Monocyte 0.85 0.12 0.45 0.59 0.58 0.15 NK−0.25 0.10 0.50 0.55 0.55 0.65 T Cell (cd4+) −2.79 0.63 1.76 2.13 1.960.15 T Cell (cd8+) 2.22 −0.84 1.77 1.81 1.59 0.16 Age (Intercept. γ₀)−0.72 −0.07 0.35 0.39 0.39 0.070 60-65 B Cell 0.54 0.07 0.25 0.49 0.490.27 Granulocyte 0.71 0.06 0.42 1.99 1.98 0.72 Monocyte 0.27 0.08 0.420.58 0.58 0.64 NK −0.24 0.06 0.47 0.55 0.55 0.65 T Cell (cd4+) −3.540.80 1.66 2.02 1.97 0.072 T Cell (cd8+) 2.84 −0.97 1.66 1.85 1.64 0.084Age (Intercept, γ₀) −0.53 −0.08 0.40 0.41 0.41 0.19 65-70 B Cell −0.030.07 0.29 0.51 0.51 0.96 Granulocyte 2.46 0.02 0.48 2.17 2.17 0.26Monocyte 0.85 0.12 0.48 0.64 0.64 0.18 NK −0.89 0.07 0.54 0.59 0.60 0.14T Cell (cd4+) −6.12 1.48 1.89 2.18 2.12 0.0038 T Cell (cd8+) 4.37 −1.641.89 1.87 1.71 0.011 Age (Intercept. γ₀) −1.20 −0.07 0.40 0.41 0.410.0037 70-75 B Cell 0.29 0.07 0.29 0.48 0.48 0.55 Granulocyte 2.13 −0.050.48 2.05 2.04 0.30 Monocyte 0.76 0.12 0.48 0.60 0.60 0.21 NK −0.51 0.190.54 0.56 0.55 0.36 T Cell (cd4+) −6.82 1.97 1.89 2.16 2.12 0.0013 TCell (cd8+) 5.35 −2.20 1.90 1.89 1.79 0.0028 Age (Intercept, γ₀) −0.31−0.09 0.49 0.46 0.45 0.49 75+ B Cell 0.13 0.08 0.35 0.54 0.53 0.81Granulocyte 1.10 −0.15 0.58 2.12 2.11 0.60 Monocyte 1.73 0.12 0.59 0.640.63 0.0065 NK −0.30 0.13 0.66 0.60 0.59 0.61 T Cell (cd4+) −6.54 1.312.30 2.29 2.18 0.0027 T Cell (cd8+) 2.73 −1.37 2.31 2.06 1.86 0.14 Est =Regression coefficient estimate (×100%) Bias₂ = Double-bootstrap biasestimate (×100%). SE₀ = Naive standard error (×100%). SE₁ =Single-bootstrap standard error (×100%). SE₂ = Double-bootstrap standarderror (×100%). P-values were computed using SE₂

TABLE 8 Estimated Regression Coefficients for Bisulfite Conversion inOvarian Cancer Data Set P- Est Bias₂ SE₀ SE₁ SE₂ value BSC1 (Intercept,γ₀) −0.08 0.00 0.14 0.09 0.10 0.39 (Green/ B Cell −0.10 0.00 0.10 0.100.10 0.30 1000) Granulocyte 0.13 0.04 0.17 0.40 0.40 0.74 Monocyte 0.13−0.01 0.17 0.12 0.12 0.26 NK −0.09 0.00 0.19 0.14 0.14 0.53 T Cell(cd4+) 0.51 −0.14 0.65 0.48 0.51 0.32 T Cell (cd8+) −0.23 0.11 0.66 0.400.47 0.62 BSC2 (Intercept, γ₀) 0.25 0.00 0.14 0.08 0.08 0.0027 (Green/ BCell 0.07 0.00 0.10 0.08 0.08 0.40 1000) Granulocyte 0.07 0.01 0.17 0.380.37 0.84 Monocyte −0.18 0.01 0.17 0.10 0.10 0.075 NK 0.10 0.00 0.190.12 0.12 0.41 T Cell (cd4+) −0.65 0.20 0.67 0.41 0.50 0.20 T Cell(cd8+) 0.63 −0.21 0.68 0.34 0.45 0.16 Est = Regression coefficientestimate (×100%) Bias₂ = Double-bootstrap bias estimate (×100%). SE₀ =Naive standard error (×100%). SE₁ = Single-bootstrap standard error(×100%). SE₂ = Double-bootstrap standard error (×100%). P-values werecomputed using SE₂. It is noted that coefficients are given as %/1000units fluorescence, and that standard deviations for BSC1 and BSC2 were1950 and 2169, respectively.

Compared with controls, data obtained from cases showed significantincreases in granulocytes and significant decreases in B cells, NKcells, and CD4+ T cells. Cases also showed marginally significantincreases in monocytes. These results are consistent with previousliterature, in which it has been demonstrated that ovarian cancerpatients experience decreases in B and T lymphocytes (den Ouden et al.,1997, Eur J Obstet Gynecol Reprod Biol, 72, 73-77; Bishara et al., 2008,Reprod Biol, 138, 7175; Cho et al., 2009, Cancer Immunol Immunother, 58,1523), increases in monocytes (den Ouden et al., 1997, Eur J ObstetGynecol Reprod Biol, 72, 73-77; Bishara et al., 2008, Reprod Biol, 138,7175) and (somewhat equivocally) increases in eosinophil granulocytes(Bishara et al., 2008, Reprod Biol, 138, 7175). Additionally, there weresignificant systematic decreases in CD4+ T cells with increasing age,with a gradient consistent in direction and somewhat consistent inmagnitude with the corresponding effect found in the HNSCC data set. TheCD8+ T cell coefficients for were positive, with gradient consistent indirection and somewhat consistent in magnitude with the correspondingeffect found in the HNSCC data set. No bisulfite conversion coefficientwas significant, and coefficients were of small magnitude (Table 8;generally less than 1 percentage point per standard deviation).

Example 8 Application of the Methods Herein to Subpopulations of DownSyndrome Patients and Controls

The method herein was applied to trisomy 21 (Down syndrome) data set(Kerkel et al., PLoS Genet. 2010, 6(11):e1001212) consisting of 29 totalperipheral blood leukocyte samples from Down syndrome cases and 21controls, as well as six T cell samples from cases and four T cellsamples from controls (GEO Accession number GSE25395). Because of thepotential for bias induced by copy number amplification four CpG siteson Chromosome 21 were excluded, resulting in m=96 CpG sites that wereused for analysis. A clustering heatmap displaying the DNA methylationdata is shown in FIG. 7. In one analysis data from cases and controlswere compared using the total leukocyte samples only, and in anothertotal leukocytes to T cells were compared, pooling cases and controls.Coefficient estimates are provided in Table 9. The only significantdifference between cases and controls was in B cell distribution, withbias-corrected estimated decrease of 4.8%, 95% confidence interval(−6:2%; −3:5%). This result is consistent with known immunecharacteristics of Down Syndrome, including deficiencies in both B and Tcells (Verstegen et al., 2010, Pediatr Res, 67, 563-9; Ram and Chinen,2011, Clin Exp Immunol, 164, 9-16). However, in the comparison betweentotal leukocytes and T cells, the coefficients except B Cell and NK werehighly significant, in directions consistent with comparison of a sampleof purified T cells to a generic whole blood sample. In fact, anestimate of the cellular composition of the T cell samples can beobtained by a simple linear transformation of Γ estimates (addingintercept terms with the T cell coefficients); this operation producesvalues that are not significantly distinct from zero for the cell typesexcept CD4+ and CD8+, whose bias-corrected estimates were, respectively,75.9%, 95% confidence interval (67%; 85%) and 8.6%, 95% confidenceinterval (0%; 17%), for cases and controls consistent with the knowndistribution of these T cells. For the analysis of case vs. controlwithin total leukocytes, R_(1,0) ² was estimated at 4.5%, and R_(1,1) ²was estimated at 67:6%. For the analysis of total leukocyte vs. T cellwith pooled cases and controls, R_(1,0) ² was estimated at 81.4%, andR_(1,1) ² was estimated at 98:9%. The latter set of coefficients ofdetermination indicates that a substantial portion of variation isexplained by composition of leukocytes, which is the expected result forsuch an analysis.

TABLE 9 Estimates for Down syndrome analysis (case vs. control, totalleukocyte vs. T Cell) P- Est Bias₂ SE₀ SE₁ SE₂ value Case Intercept, γ₀2.02 −0.10 0.86 1.17 1.17 0.084 Status B Cell −4.87 −0.03 0.62 0.70 0.69<0.0001 (total Granulocyte 3.85 0.15 1.02 3.01 2.98 0.20 leuko- Monocyte0.12 0.11 1.03 0.97 0.96 0.90 cytes) NK −0.63 −0.06 1.16 0.83 0.82 0.44T Cell −0.30 −0.37 4.02 2.49 2.66 0.91 (cd4+) T Cell −1.89 0.35 4.032.47 2.42 0.43 (cd8+) T Cell Intercept, γ₀ −0.97 0.07 1.7 1.4 1.6 0.54(cases + B Cell −0.51 0.02 1.2 1.2 1.2 0.67 controls) Granulocyte −56.210.49 2.1 3.4 3.4 <0.0001 Monocyte −5.13 −0.37 2.1 1.1 1.3 <0.0001 NK0.07 0.34 2.3 1.5 1.7 0.97 T Cell 60.18 −2.89 8.1 3.2 5.2 <0.0001 (cd4+)T Cell 3.00 2.34 8.2 3.3 5.4 0.58 (cd8+) Est = Regression coefficientestimate (×100%). Bias₂ = Double-bootstrap bias estimate (×100%). SE₀ =Naive standard error (×100%). SE₁ = Single-bootstrap standard error(×100%). SE₂ = Double-bootstrap standard error (×100%). P-values werecomputed using SE₂.

Example 9 Application of the Methods Herein to Obesity in an AfricanAmerican Population

The method herein was also applied to an obesity data set (Wang et al.,2010) consisting of seven lean African-Americans and seven ObeseAfrican-Americans (GEO Accession number GSE25301). FIG. 8 shows aclustering heatmap displaying the DNA methylation data. In thisanalysis, z consisted of obesity status. Obese subjects had an estimatedincrease of 12 percentage points in granulocytes, bias-corrected 95%confidence interval (3:4%; 20%) and an estimated decrease of 4percentage points in NK cells, bias-corrected 95% confidence interval(−7:7%; −0:9%) (Table 10). No significant differences were found forother blood cell types. The specific immunological differences estimatedby the method herein are consistent with known immunologicalperturbations associated with type II diabetes (Lynch et al., 2009,Obesity, 17(3), 601-5; Anderson et al., 2011, Curr Opin Lipidol, 21(3),172-7.).

TABLE 10 Estimated Regression Coefficients for Data Set concerningObesity in African Americans P- Est Bias₂ SE₀ SE₁ SE₂ value ObeseIntercept, γ₀ 0.96 −0.09 1.08 0.85 0.84 0.25 B Cell 0.70 −0.03 0.78 1.161.14 0.54 Granulocyte 12.25 0.51 1.30 4.27 4.27 0.0041 Monocyte −0.70−0.01 1.31 1.57 1.54 0.65 NK −4.42 −0.13 1.46 1.75 1.73 0.011 T Cell(cd4+) −6.97 −0.29 5.11 6.27 5.49 0.20 T Cell (cd8+) −2.29 0.22 5.134.97 4.36 0.60 Est = Regression coefficient estimate (×100%). Bias₂ =Double-bootstrap bias estimate (×100%). SE₀ = Naive standard error(×100%). SE₁ = Single-bootstrap standard error (×100%). SE₂ =Double-bootstrap standard error (×100%). P-values were computed usingSE₂.

Example 10 Additional Analyses

In this example a special case was considered in which subjectpopulation was such that for this population z=0 and the population wassufficiently homogeneous with respect to blood cell distribution toadmit sensible characterization of that distribution. In such case it ispossible to recover estimates from {circumflex over (Γ)}. The results ofsuch an analysis applied to the HNSCC case/control data set is shown inTable 11 below.

TABLE 11 White Blood Cell Distribution in HNSCC Controls 95% Conf. EstSE₂ Bias₂ BC-Est Int. B Cell 7.9 0.5 0.1 7.8  (6.8, 8.9) Granulocyte42.2 1.2 −0.1 42.3 (39.9, 44.6) Monocyte 9.9 0.7 0.3 9.6  (8.3, 10.9) NK7.9 0.7 0.2 7.7  (6.3, 9.1) T Cell (cd4+) 15.2 3.0 −0.1 15.3  (9.5,21.2) T Cell (cd8+) 7.6 3.0 0.4 7.2  (1.4, 13.0)TZ,1/32 Est = Regressioncoefficient estimate (×100%), normalized so that estimates sum to 90%.SE₂ = Double-bootstrap standard error (×100%). Bias₂ = Double-bootstrapbias estimate (×100%). BC-Est = bias-corrected estimate.

If the coefficients represented a complete profiling of blood celltypes, the estimates should sum approximately to one, even though themodel does not explicitly constrain them so. In this case, the originalbias corrected estimates (of leukocyte distribution in HNSCC controls)summed to 133%. The table shows the values re-normalized to 90%, theanticipated proportion of the cell types. The resulting estimateddistribution of leukocytes is consistent with the literature (Alberts Bet al., 2008, Molecular Biology of the cell. New York, N.Y.: Taylor andFrancis, 5^(th) edition)

An additional analysis was also conducted in which S₀ consisted of onlysamples with pure CD4+ or CD8+ cells and S₁ to consisted only of sampleshaving the less purified T-lymphocytes. For such S₁, there were nocovariates, so z consisted only of an intercept. The followingunnormalized bias-corrected estimates: 69.0% CD4+, 95% confidenceinterval (54%; 84%), and 32.5% CD8+, 95% confidence interval (19%; 46%).This is consistent with known proportions of these specific cell typesamong T lymphocytes.

Example 11 Sensitivity Analysis

The bias estimates evident from the double-bootstrap procedure admit thepossibility of correcting the bias arising from measurement error. Thereis no statistical procedure for correcting the other possible sources ofbias, those arising from unprofiled cell types and non-cell-mediatedprofile differences, i.e. methylation difference signatures δ withnonzero projection onto the space spanned by the WBC signatures. It ispossible to conduct a sensitivity analysis using the theory presentedunder “Bias” (equations 6-9). It is shown that the magnitude of the biasis likely to be small, less than a percentage point.

Detailed Analysis

A method of sensitivity analysis to estimate the magnitude of biasarising from unprofiled cell types and non-cell-mediated profiledifferences is described below for the HNSCC data set presented inExample 6 and FIG. 4.

For each value of kε

_(m), k elements are randomly sampled,

_(k)⊂

_(m) without replacement, then k rows of B₁ are sampled withoutreplacement, δ* is set equal to the m×d₁ zero matrix, and the rowsindicated by

_(k) are substituted by the k rows selected from B₁. The matrix δ*served as a representative of the sum of processes having systematicmethylation changes at k locations, of total magnitude consistent withthe observed data (under the conservative assumption that no systematicmethylation difference is cell mediated), and α*=(B₀B₀)⁻¹B₀δ*represented the corresponding bias in Γ. If, as in this situation, thegoal was to assess the sensitivity to bias in column of B₁ (i.e. CaseStatus), the uninteresting columns of δ* or α* could be simply deleted.Replicating this resampling procedure 100,000 times, an approximation tothe distribution of possible biases corresponding to processes involvingexactly k CpG sites was generated. HG. 4 displays the results of such ananalysis, showing the distribution of (α*^(T)α*)^(−1/2) for variousvalues of k. It is noted that the relationship of median values to m wasconsistent with the theory presented in Example 12 under the subheading“Additional simulations.” The median values of (a*^(T)α*) had an almostperfect linear relationship with m. The magnitude of the bias was small:for the more likely low values of k, the bias was 0.1 to 0.25 of apercentage point. In addition, this analysis was conservative in that itassumed the effect in B₁ was due to non-cell-mediated processes, astrongly conservative assumption. In addition, for various choices of π₀over a range of small magnitudes, the expected bias over the uniformposterior implied by π₀ was computed by iterated expectation, first bycomputing the mean bias for each choice of k, then forming theexpectation over the binomial distribution Bin(100, π₀), As noted indetails described under “Bias” in Example 3 the result scaled linearlywith π₀. The constant of proportionality was estimated to be 2.08percentage points. In summary, if the prior expectation is of evenmoderate size (˜0.1) that any one CpG among the 100 selected for thisapplication will show systematic differentiation between cases andcontrols, then the implied bias would be expected to be less than apercentage point.

Example 12 Simulations

To verify the properties of the proposed methodology, extensivesimulation studies were conducted. Simulation parameters were obtainedfrom the HNSCC data set, and most simulations assumed no sources ofbiological bias (DNA methylation changes arising from processes notmediated by the profiled leukocytes, including shifts in distributionwithin cell types not profiled). In every simulation, S₀ was specifiedto consist of five B cell samples, ten granulocyte samples, fivemonocyte samples, 15 NK samples, five general T cell samples, eightspecific CD4+ T cell samples, and two specific CD8+ T cell samples.Estimates from the external validation set S₀, described above, wereused for mean methylation profiles among WBC types, using the m=100 mostinformative CpG sites.

n_(l)/2 cases and n_(o)/2 controls, were specified, noε{100, 200, 500}.Among the controls, methylation profiles were generated by a white bloodcell population of 7% B cells, 62% granulocytes, 6% monocytes, 2% NKcells, and 13% were T cells, of which 65% were CD4+ cells and 35% wereCD8+ cells, and the remaining 5% were unspecified (and assumed to havemean equal to the unsorted T-lymphocytes). Among cases, one of thefollowing scenarios was specified: a 4% reduction in CD4+ cells, a 2%reduction in CD8+ cells, and an 8% increase in granulocytes (alternativewith changes in both CD4+ and CD8+, “Strong Alternative I”); a 6%reduction in CD4+ cells, and an 8% increase in granulocytes (alternativewith changes in CD4+ and not CD8+, “Strong Alternative II”); a weakeralternative with half the effects of Strong Alternative I (“MixedAlternative” elaborated upon below); and two null scenarios with nochanges in cell population, each with a different assumption about δ. Itis noted that these changes reflect absolute changes in percentagepoints, not relative changes. It is also noted that these values wereactually used to generate Dirichlet-distributed mixture weights for eachsimulated subject, with Dirichlet parameters equal to a precisionparameter (10 corresponding to “noisy”, and 100 corresponding to“precise”) times the mean weight described above.

Residual effects ξ_(i) ⁽⁰⁾ for controls were set equal to 0.1 timesestimated intercept μ₁ and residual effects ξ_(i) ⁽¹⁾ for cases were setequal to 0.08 or 0.09 times μ₁ plus multiples 10θ of the column of Ucorresponding to case. The constants of proportionality 0.1, 0.08, and0.09 were chosen to correspond to assumed contributions of ξ to anoverall methylation signature presumed to be dominated by profiledpopulations of white blood cells in specified proportions, with 0.08used for the strong alternatives and 0.09 used for the MixedAlternative. The constant 10 was used to amplify the scale of δ so thatits effect could be detected in simulation; it is noted that U wasorthogonal to the white blood cell profiles, by construction.

It is noted also that the individual, Dirichlet-generated subjectweights did not necessarily sum to one, and the difference from 1 wasnot applied as a multiplier; thus the resulting ξ corresponded to thesituation Pμ_(q)=0, where P=(B₀B₀)⁻¹B₀ along with orthogonalcontributions from the λ terms of (6). The multiplier θ=0 was used forstrong alternatives, and the “Strong Null” case (i.e. no methylationdifferences between cases and controls) and θ=0.5 was used for the MixedAlternative, and θ=1 was used for the “Mixed Null” with case/controldifferences not mediated by cellular population differences.

A simple normal error structure for e_(oh) and e_(oi) was specified,with no chip effects, and with variance equal to the sum of chip andresidual variance estimated (individually for each CpG) for the HNSCCdata. For each simulation, 50 bootstraps were used to estimate standarderrors. 1000 simulations were run for each scenario. Table 12 presentsresults for n₁=200 with precise mixture weights (small within-statusheterogeneity in distribution), and Table 13 presents results for n₁=200with noisy mixture weights (larger within-status heterogeneity). Thetables show mean estimate, simulation standard deviation, medianestimates for the three types of proposed standard errors, andproportion of p-values (obtained from z-scores constructed using thedouble-bootstrap standard error) falling below α=0.05 and α=0.01.

In these cases, the bias in estimation was minimal. Both types ofbootstrap produced similar standard error estimates, which were close tothe simulation standard deviation and often quite different from thenaive standard error estimate. Under null scenarios, the rejectionprobabilities were tolerably close to their nominal values, and foralternatives, power could be quite high, even with this modest design.

Results for Coefficients of Determination

Results for the coefficients of determination are provided in Table 14.R_(1,0) ² decreased with decreasing strength of the alternative, fallingto zero under both null scenarios. For strong alternatives, R_(1,1) ²was frequently close to 1.0. For the Mixed Alternative, R² _(1,1) had alower, and still high values ranging from about 0.85 to 0.90. For themixed null result, R_(1,1) ² typically had lower values, from about 0.05to 0.20. In the Strong Null case, R_(1,1) ² covered a broader rangeamong moderately low values; note, however, that this scenarioeffectively represents 0/0, i.e. a poorly defined value. Scenarios withn₁ε{100, 500} produced similar results, with simulation standarddeviations and power adjusted accordingly, and still having practicalutility.

Additional Simulations

Additional simulations, were conducted which assumed bias arising fromprocesses not profiled by the profiled leukocytes. For these scenarios,ξ⁰ was set to {circumflex over (μ)}₁, and ξ¹=ξ⁰ except for a set of CpGsites randomly selected among the m dimensions of the array (once andfor all before 1000 simulations); among those dimensions j, ξ¹ _(j) wasset to 1−{circumflex over (μ)}_(1j), reflecting a \reversal” ofmethylation state. Estimates were biased towards the null, on the orderof about a percentage point.

TABLE 12 Simulation results (precise mixtures, n₁ = 200) Truth Est SDSE₀ SE₁ SE₂ pow(0.05) pow(0.01) Strong Alternative I (θ = 0) B Cell 0.00.07 1.00 0.92 0.97 0.98 0.057 0.018 Granulocyte 8.0 8.02 0.73 0.39 0.730.73 1.000 1.000 Monocyte 0.0 0.01 0.48 0.43 0.47 0.47 0.055 0.013 NK0.0 −0.09 1.08 1.02 1.02 1.05 0.066 0.015 T Cell (cd4+) −4.0 −4.06 0.810.80 0.78 0.81 0.999 0.989 T Cell (cd8+) −2.0 −1.93 0.83 0.81 0.78 0.810.653 0.419 Strong Alternative II (θ = 0) B Cell 0.0 0.00 0.97 0.92 0.970.99 0.048 0.016 Granulocyte 8.0 8.00 0.71 0.39 0.72 0.72 1.000 1.000Monocyte 0.0 0.03 0.48 0.42 0.47 0.47 0.063 0.016 NK 0.0 0.03 1.04 1.021.01 1.05 0.052 0.014 T Cell (cd4+) −6.0 −5.83 0.76 0.80 0.77 0.80 1.0001.000 T Cell (cd8+) 0.0 −0.22 0.81 0.81 0.80 0.81 0.064 0.014 MixedAlternative (θ = 0.5) B Cell 0.0 −0.02 1.02 1.10 0.96 0.98 0.065 0.011Granulocyte 4.0 3.99 0.75 0.47 0.73 0.73 1.000 0.995 Monocyte 0.0 0.020.49 0.51 0.47 0.47 0.060 0.015 NK 0.0 0.04 1.05 1.22 1.01 1.04 0.0540.009 T Cell (cd4+) −2.0 −2.07 0.82 0.96 0.79 0.83 0.695 0.471 T Cell(cd8+) −1.0 −0.95 0.82 0.96 0.78 0.82 0.203 0.082 Mixed Null (θ = 1) BCell 0.0 0.00 1.04 1.58 0.96 1.02 0.066 0.017 Granulocyte 0.0 0.03 0.730.67 0.74 0.74 0.055 0.014 Monocyte 0.0 −0.01 0.47 0.73 0.47 0.48 0.0540.013 NK 0.0 −0.01 1.12 1.76 1.01 1.09 0.063 0.014 T Cell (cd4+) 0.00.01 0.87 1.38 0.80 0.90 0.054 0.013 T Cell (cd8+) 0.0 −0.02 0.88 1.390.79 0.89 0.057 0.015 Strong Null (θ = 0) B Cell 0.0 −0.01 0.99 0.900.96 0.96 0.068 0.014 Granulocyte 0.0 0.03 0.72 0.38 0.74 0.73 0.0520.013 Monocyte 0.0 −0.01 0.47 0.42 0.47 0.47 0.055 0.013 NK 0.0 −0.011.06 1.00 1.01 1.02 0.059 0.020 T Cell (cd4+) 0.0 0.00 0.81 0.78 0.800.82 0.054 0.013 T Cell (cd8+) 0.0 −0.01 0.81 0.79 0.79 0.80 0.054 0.015Est = Mean regression coefficient estimate (×100%); SD = SD regressioncoefficient estimate (×100%). SE₀ = Naive standard error (×100%); SE₁ =Single-bootstrap standard error (×100%). SE₂ = Double-bootstrap standarderror (×100%). pow(α) = Pr{P₂ < α}, where P₂ is the p-value computedfrom SE₂.

TABLE 13 Simulation Results (Noisy Mixtures, n₁ = 200) Truth Est SD SE₀SE₁ SE₂ pow(0.05) pow(0.01) Strong Alternative I (θ = 0) B Cell 0.0−0.06 1.39 0.92 1.36 1.34 0.065 0.019 Granulocyte 8.0 7.87 2.02 0.392.00 1.99 0.974 0.897 Monocyte 0.0 0.05 1.03 0.42 1.04 1.02 0.049 0.012NK 0.0 −0.02 1.21 1.02 1.16 1.18 0.061 0.010 T Cell (cd4+) −4.0 −4.001.23 0.79 1.21 1.22 0.903 0.739 T Cell (cd8+) −2.0 −1.97 1.05 0.80 1.020.98 0.517 0.298 Strong Alternative II (θ = 0) B Cell 0.0 −0.08 1.380.92 1.36 1.34 0.063 0.017 Granulocyte 8.0 7.90 2.03 0.39 1.99 1.980.973 0.905 Monocyte 0.0 0.10 1.07 0.42 1.04 1.02 0.054 0.019 NK 0.00.02 1.17 1.02 1.14 1.18 0.053 0.009 T Cell (cd4+) −6.0 −5.70 1.19 0.801.13 1.16 0.999 0.986 T Cell (cd8+) 0.0 −0.23 1.08 0.81 1.10 1.04 0.0660.015 Mixed Alternative (θ = 0.5) B Cell 0.0 0.05 1.42 1.10 1.34 1.340.066 0.016 Granulocyte 4.0 4.00 2.01 0.47 2.02 2.01 0.500 0.291Monocyte 0.0 0.01 1.06 0.51 1.03 1.02 0.072 0.020 NK 0.0 −0.02 1.24 1.221.13 1.16 0.064 0.013 T Cell (cd4+) −2.0 −2.11 1.30 0.95 1.26 1.28 0.3910.191 T Cell (cd8+) −1.0 −0.94 1.08 0.96 1.05 1.02 0.163 0.052 MixedNull (θ = 1) B Cell 0.0 0.06 1.41 1.59 1.36 1.37 0.062 0.016 Granulocyte0.0 0.04 2.08 0.67 2.06 2.05 0.056 0.008 Monocyte 0.0 −0.02 1.05 0.731.03 1.03 0.058 0.020 NK 0.0 0.01 1.26 1.76 1.14 1.22 0.066 0.011 T Cell(cd4+) 0.0 −0.01 1.42 1.38 1.31 1.36 0.067 0.016 T Cell (cd8+) 0.0 0.001.19 1.39 1.08 1.10 0.073 0.011 Strong Null (θ = 0) B Cell 0.0 0.06 1.370.91 1.36 1.32 0.065 0.017 Granulocyte 0.0 0.03 2.07 0.38 2.06 2.050.055 0.009 Monocyte 0.0 −0.02 1.04 0.42 1.03 1.02 0.057 0.021 NK 0.00.01 1.19 1.01 1.14 1.16 0.053 0.018 T Cell (cd4+) 0.0 −0.04 1.38 0.791.31 1.31 0.069 0.015 T Cell (cd8+) 0.0 0.01 1.11 0.79 1.08 1.03 0.0650.016 Est = Mean regression coefficient estimate (×100%); SD = SDregression coefficient estimate (×100%). SE₀ = Naive standard error(×100%); SE₁ = Single-bootstrap standard error (×100%). SE₂ =Double-bootstrap standard error (×100%). pow(α) = Pr{P₂ < α}, where P₂is the p-value computed from SE₂.

TABLE 14 Results for coefficients of determination Median R_(1, 0) ²Median R_(1, 1) ² (Interquartile (Interquartile Range) Range) PreciseStrong Alternative I 0.13 (0.12-0.15) 0.98 (0.97-0.98) Mixtures (θ = 0)n₁ = 200 Strong Alternative II 0.13 (0.12-0.15) 0.98 (0.97-0.98) (θ = 0)Mixed Alternative 0.04 (0.03-0.05) 0.88 (0.85-0.91) (θ = 0.5) Mixed Null(θ = 1) 0.00 (0.00-0.00) 0.10 (0.05-0.17) Strong Null (θ = 0) 0.00(0.00-0.00) 0.25 (0.15-0.38) Noisy Strong Alternative I 0.05 (0.03-0.06)0.98 (0.97-0.98) Mixtures (θ = 0) n₁ = 200 Strong Alternative II 0.05(0.03-0.06) 0.98 (0.97-0.98) (θ = 0 ) Mixed Alternative 0.01 (0.01-0.02)0.89 (0.81-0.94) (θ = 0.5) Mixed Null (θ = 1) 0.00 (0.00-0.01) 0.46(0.28-0.64) Strong Null (θ = 0) 0.00 (0.00-0.01) 0.72 (0.55-0.85)

Example 13 Identification of a Unique DMR in CD3Z Gene

Individual samples of sorted, normal, human, peripheral blood leukocytesas shown in Table 15, were purchased from AllCells®, LLC (Emeryville,CA). These leukocytes were sorted in a column with antibody-conjugatedmagnetic beads using a combination of positive and negative selection.Genomic DNA from the leukocytes was extracted according tomanufacturer's protocol using the DNeasy Blood & Tissue kit (Qiagen) orthe AllPrep DNA/RNA/Protein Mini Kit according to manufacturer'sprotocol (Cat. No. 8004, QIAGEN, Valencia, Calif.), then quantified byNanoDrop ND-1000 Spectrophotometer (NanoDrop Technologies, Inc.,Wilmington, Del.) and stored at −20° C. The extracted genomic DNA wassubjected to Bisulfite conversion by treatment with sodium bisulfiteusing the EZ DNA Methylation Kit (Zymo) following the manufacturer'sprotocol, thereby converting unmethylated cytosine residues to uraciland leaving methylated cytosine residues intact.

TABLE 15 Sorted leukocytes from AllCells ®, LLC Cell LineageAbbreviation N CD3+ T Lymphocytes Pan-T 5 CD3+CD4+ T Lymphocytes CD4 2CD3+CD4+CD25+ Treg 6 Regulatory T Lymphocytes CD3+CD8+ T Lymphocytes CD82 CD56+ Natural Killer Cells NK 3 (Large Granular Lymphocytes) CD 19+ BLymphocytes B 5 CD 14+ Monocytes Mono 4 CD15+ Granulocytes Gran 5 CD16+Neutrophils Neut 4

Analysis of the methylation status of the bisulfate converted DNA wasperformed using DNA methylation microarray, Infinium® HumanMethylation27Beadchip Microarray, (Illumina®, Inc., San Diego, Calif.). Thismicroarray quantifies the methylation status of 27,578 CpG loci from14,495 genes, with a redundancy of 15-18-fold. Bisulfite converted,genomic DNA from sorted human peripheral blood leukocytes was subjectedto whole genome amplification. The purified whole genome amplified DNAwas hybridized to locus-specific DNA oligomers linked to individual beadtypes corresponding to each CpG locus, unmethylated or methylated.Allele-specific primer annealing was followed by specific single-baseextension using labeled ddNTPs. Extension only occurs if the bead typematches the methylation status of the genomic DNA.

The array was fluorescently stained, scanned, and fluorescentintensities of each of the unmethylated and methylated bead types weremeasured. The ratio of fluorescent signals is computed from both allelesusing the following equation: β=(max(M,0))/(|U|+|M|)+100. The β-value isa continuous variable ranging from 0 (unmethylated) to 1 (completelymethylated) that represents the methylation at each CpG site and is usedin subsequent statistical analyses. Data were assembled with BeadStudiomethylation software from Illumina, Inc. (San Diego, Calif.). Bibikova,M., et al., Epigenomics 1, 177-200 (2009).

A comparison of methylation in sorted normal human immune cells wasobserved to produce distinct profiles of methylation markers for furtherconsideration. As shown in FIG. 9 DNA Methylation profiles distinguishedlymphocytes from myeloid derived leukocytes. Recursively partitionedmixture model (RPMM) of autosomal gene Infinium beta values from sorted,human, peripheral blood leukocytes was performed in R version 2.11.1 ofIllumina's software which provides convenient mechanisms for loading andanalyzing the results of methylation status, and for quality control andbasic visualization tasks.

Candidate DNA regions with high potential to discriminate CD3+ T cellsfrom non-T cells were chosen based on the criteria of beingdifferentially demethylated and differentially overexpressed in CD3+ Tcells compared with other cell types (monocytes, granulocytes, NK cells,and B cells). Two quantitative methylation methods, bisulfitepyrosequencing and MS-qPCR, were used to confirm array methylation.

The highest ranking 5000 most variable CpG loci were plotted on the left(FIG. 9 left panel), such that the less methylated loci appear as greyand more methylated loci appear as black. The number of individualleukocyte samples in each methylation class is shown in FIG. 9 in thetable to the right. The algorithm for prioritizing these candidatesdescribed herein yielded CD3E and CD3Z as specific DMR for identifyingCD3+ T cells.

Example 14 Patient Characteristics and Biological Samples forDetermining CD3±T Cell Distribution in Glioma Cases and Controls

Whole blood samples from glioma patients (N=94) and controls (N=71) wereobtained from the UCSF San Francisco Adult Glioma Study (AGS) for theseexamples (Table 16). The patients included in this example werediagnosed between 1997 and 2011. Details of subject ascertainmentthrough the rapid case ascertainment program of San Francisco regionalpopulation-based registry or the UCSF Neuro-oncology Clinic have beendescribed (Wrensch M et al., 2007, Clin Cancer Res 13(1): 197-205;Felini M J et al. 2009, Cancer Causes Control 20(1): 87-96; Wrensch M etal., 2009, Nat Genet. 41(8): 905-8; Christensen B C et al., 2011, J NatlCancer Inst 103(2): 143-53). Pertinent data for this analysis includedage at histological diagnosis, gender, vital status, and survival timebetween diagnosis date and date of death for those deceased or betweendiagnosis date and date of last contact for those alive, and any ofcigarette smoking history and exposure to steroids, chemotherapy andradiation therapy.

A panel of 120 fresh frozen glioma tumors from the UCSF Brain TumorResearch Center tissue bank, obtained under appropriate institutionalreview board approval, which were previously characterized for molecularfeatures (Christensen B C et al., 2011, J Natl Cancer Inst 103(2):143-53; Zheng S et al., 2011, Neuro Oncol 13(3): 280-9) was chosen fortumor MS-qPCR and IHC studies (Table 16). Tumor samples were defined assecondary GBM if the patients had prior histological diagnosis of alow-grade glioma. The ages are given at the time of surgery, whichoccurred at UCSF between 1990 and 2003. This tumor set contained thefollowing histological subtypes: 2 pilocytic astrocytoma (PA), 15ependymoma grade II (EPII), 20 oligodendroglioma grade II (ODII), 16oligoastroglioma grade II (OAII), 3 oligoastroglioma grade III (OAIII),23 astrocytoma grade II (ASH), 4 astrocytoma grade III (ASIII) and 37astrocytoma grade IV, also called glioblastoma multiforme grade IV(GBM), ten of which were recurrent and five of which were secondary.

Sorted, normal, human, peripheral blood leukocyte subtypes were isolatedfrom different non-diseased individuals' whole blood by MACS using acombination of negative and positive selection with highly specific cellsurface antibodies conjugated to magnetic beads. The purity of separatedcells was determined with flow cytometry to be >97%.

Example 15 Bisulfite Pyrosequencing and MS-qPCR Assays for ValidatingCD3Z, CD3E and FOXP3 Specific DMRs

The demographic characteristics of donors for samples (N=285) used inMS-qPCR analysis is as shown in Table 16. CpGenome Universal MethylatedDNA (Cat. No. S7821, Millipore Corp., Temecula, Calif.), purified T celland Treg DNA were bisulfite converted at the same time. Bisulfitepyrosequencing assays were designed using Pyromark Assay Design 2.0(QIAGEN), and carried out using a Pyromark MD pyrosequencer runningPyromark qCpG software (QIAGEN). Custom oligonucleotide primers used inbisulfite pyrosequencing were obtained from Invitrogen (LifeTechnologies Co, Carlsbad Calif.). For MS-qPCR reactions, primers andTaqMan major groove binding (MGB) probes with 5′ 6FAM and 3′non-fluorescent quencher (NFQ) as well as TaqMan 1000 RXN Gold withBuffer A Pack were obtained from Applied Biosystems (Part No. 4304971,4316034 and 4304441, Applied Biosystems, Foster City, Calif.). Theprimer and probe sequences are shown in Table 17 and FIG. 12. Solutionsfor MS-qPCR: 10×TaqMan Stabilizer containing 0.1% Tween-20, 0.5% gelatinwere prepared weekly. Each reaction of 20 μl contained 5 μl DNA, 11.9 μlPreMix, 3 μl OligoMix, and 0.1 μl Taq DNA polymerase. Cycling wasperformed using a 7900HT Fast Real-Time PCR System (Applied Biosystems,Foster City, Calif.); 50 cycles at 95° C. for 15 sec and 60° C. for 1min after 10 min at 95° C. preheat. Samples were run in triplicate usingthe absolute quantification method. Copy number of the target locus ineach sample was determined by reference to a four-point standard curve,which was based on known copies of bisulfite converted template.

TABLE 16 Demographic characteristics of donors for samples (N = 285)used in MS-qPCR analysis Control Blood Case Blood samples samplesExcised Tumors Characteristic (n = 94) (n = 71) (n = 120) Age Median(range)  57 (22-87)  57 (20-86) 41 (1-78) Mean 55 (16.5) 56 (13)  41(15)  (standard deviation) Gender, No (%) Female 43 (46%) 26 (36%) 42(35%) Male 51 (54%) 45 (64%) 78 (65%) Race, No (%) White, 78 (83%) 67(95%) 102 (85%)  Non-Hispanic Hispanic 3 (3%) 3 (4%) 7 (6%) Asian 6 (7%)0 (0%) 4 (3%) Black 5 (6%) 0 (0%) 0 (0%) Other 1 (1%) 1 (1%) 7 (6%)

Quantification of total bisulfite converted DNA copies for standard andbiological samples was determined by reference to the C-less qPCR assayas described previously (Weisenberger D J et al., 2008, Nucleic AcidsRes 36(14): 4689-98.; Campan M et al., 2009, Methods Mol Biol 507:325-37). In this procedure one determines the relative amounts of abisulfite converted sample through the use of a TaqMan PCR reactionusing primers and probes that recognize a DNA strand that does notcontain cytosines, and hence is able to amplify the total amount of DNA(bisulfite-converted or unconverted) in a PCR reaction well. Theabsolute copy number in DNA Standard Solution (Cambio Ltd. Cambridge,UK) was used to calibrate the C-less reaction and assuming 3.3 pg=1genome copy. Universal methylated DNA and purified CD3+ T cell and TregDNA (bisulfite converted) were quantified at the same time. Since C-lessprimers hybridize to both strands of the standard DNA (non-bisulfiteconverted) and bisulfite converted samples allow for only single strandhybridization during the first cycle, the resultant copy number inbisulfite samples is multiplied by two. After C-less assay, the copynumber of the different standards: universal methylated, CD3+ T cell andTreg DNA was used to create standard curves for CD3Z and FOXP3. Tocreate a calibration curve known quantities of CD3+ T cell or Treg DNAwere spiked into universal methylated DNA in ratios that maintained aconstant total copy number in each reaction across the dilution scheme.The latter procedure mimics the conditions of detection that exist indifferentiating different relative numbers of CD3+ T cells and Tregswithin a mixture of cells in a complex biological sample. For absolutequantification of CD3Z, the four-point standard curve used 10,000,1,000, 100, and 10 bisulfite converted CD3+ T cell DNA copies; absolutequantification of FOXP3 used, 5,000, 500, 50 and 5 bisulfite convertedTreg cell DNA copies.

TABLE 17 Primer and probe sequences for MS-qPCR assays Oligonu-  cleotide Name Sequence (5′ to 3′) C-less Fwd TTGTATGTATGTGAGTGTGGGAGAGA(SEQ ID NO: 97) C-less Rev TTTCTTCCACCCCTTCTCTTCC (SEQ ID NO: 98)C-less Probe (6FAM)CTCCCCCTCTAACTCTAT(MGB, NFQ) (SEQ ID NO: 99) CD3Z FwdGGATGGTTGTGGTGAAAAGTG (SEQ ID NO: 100) CD3Z Rev CAAAAACTCCTTTTCTCCTAACCA(SEQ ID NO: 101) CD3Z Probe (6FAM)CCAACCACCACTACCTCAA(MGB,NFQ)) (SEQ ID NO: 102 FOXP3 Fwd GGGTTTTGTTGTTATAGTTTTTG (SEQ ID NO: 103)FOXP3 Rev TTCTCTTCCTCCATAATATCA (SEQ ID NO: 104) FOXP3 Probe(6FAM)CAACACATCCAACCACCAT(MGB, NFQ) (SEQ ID NO: 105) MGB: major groovebinding FAM: 6-Carboxyfluorescein NGQ: NFQ C-less qPCR assay: Campan Met al., 2009, Methods Mol Biol, 507: 325-37; Weisenberger D J et al.,2008, Nucleic Acids Res 2008; 36: 4689-98

The CD3E specific DMR DNA methylation status of the DMR in CD3E gene wasmeasured by pyrosequencing bisulfite converted DNA from sorted, human,peripheral blood leukocytes. FIG. 10A. The CD3Z specific DMR, DNAmethylation status of the DMR in CD3Z gene was measured by MethyLight®qPCR. of converted DNA from sorted, human, peripheral blood leukocytes(FIG. 10B). The genomic region containing the CD3Z DMR is shown in FIG.11.

Standard calibration curves were used to determine if the newlyidentified CD3Z DMR was useful to quantify CD3+ T cells, Tregs (FOXP3demethylated) and ratios of Tregs/CD3+ T cells in biological specimenssuch as whole or separated blood or other tissues. To obtain thesecurves quantitative real time methylation specific PCR was performed.DNA isolated from purified cell types was bisulfite converted andserially diluted into a background of fully methylated commercial DNAstandard (Qiagen). This method is referred to herein as “CS-DM assay” orassays.

It was observed that the total genomic copy numbers of each samplewithin a dilution series remained constant. Log dilutions were preparedto include the appropriate range of Ct values corresponding to testsamples (whole blood, tumor specimens). Using cytosine less: C-lessprimers genome copy numbers for each test standard were measured toensure adequate input DNA and to normalize the CD3+ and Treg assayvalues. The calibration curve for C-less total input is shown in FIG.13A (N=8 replicates); errors denote standard error of the mean Ct value.FIG. 13B shows dilution of isolated normal PanT cells (N=7 replicates)and FIG. 13C shows dilution and calibration curve for isolated CD3+CD25+T cells (N=8 replicates). For samples to be tested these calibrationcurves (FIG. 13A-C) were used to estimate total input copies, CD3+ Tcell, and Tregs copies, respectively.

The results show that the DNA methylation status of this regionidentified herein in the promoter of CD3Z gene in sorted humanperipheral blood leukocytes, which was validated as an immune cell typespecific differentially methylated region (FIG. 10B) was observed to beuseful to quantify CD3+ T cells in biological specimens such as whole orseparated blood, or other tissues.

Example 16 Flow Cytometry of Blood Lymphocytes in Whole Blood forQuantification of CD3+ T Cells

Levels of CD3+ T cells in whole blood were quantified by flow cytometryfor comparison with CD3+ T cell levels determined using CD3Z Ms-qPCRassay. Venous whole blood samples were collected in citrate EDTA andprocessed using a lysis no wash protocol (Invitrogen, Carlsbad, Calif.cat#GAS-010). Cells were labeled by direct staining with the appropriatefluorochrome-conjugated antibodies (eBioscience Inc, San Diego, Calif.),and were incubated for 20 minutes in the dark at 4° C.; CD3-fluoresceinisothiocyanate (FITC, cat #11-0038-41), anti-CD4-allophycocyanin (APC,cat #17-0048-41), anti-CD8-phycoerythrin (PE, cat #12-0086-41), andanti-CD45-PerCP-Cy5.5 (cat #45-0459-41). Isotype control mAbs were usedas negative controls. Aecucheck counting beads (Invitrogen, CarlsbadCalif. cat #PCB100) were used for quantifying leukocyte numbers.Acquisition was preformed within 48 hrs of blood draw on a FACScaliburflow cytometer using Cell-Quest Software (Becton Dickinson, FranklinLakes, N.J.). For CD3+ cells a minimum of 10,000 events were collectedon the lymphocyte gate that was set on the forward scatter vs. sidescatter (FSC vs. SSC) and then gated on CD3+ cells. CD45+ counts wereobtained by first gating on non-bead events using the FSC vs. SSC. ACD45+ histogram plot of the non-bead events was then created. CD45+cells were gated. Examples are seen in FIG. 18. Absolute counts (numbercells per μl) were obtained by taking the number of cells counted,divided by total number of beads counted, multiplied by the knownconcentration of beads. Flowjo software (TreeStar Inc, Ashland, Oreg.)was used for data analysis.

Example 17 Tumor Immunohistochemistry (IHC) for Measuring Levels ofTumor Infiltrating Lymphocytes (TIL) in Glioma Tumors

Slides were prepared from a 5 micron slice of each FFPE tumor block.Slides were stained using a Benchmark XT instrument per manufacturer'sinstructions (Ventana, Tucson, Ariz.). CD3 antibody (Dako, Carpinteria,CA cat #A0452) was added in a 1:600 dilution, and incubated for 30minutes. CD8 antibody (Dako, Carpinteria, CA cat #M7103) was added in a1:200 dilution and incubated for 60 minutes. CD4 antibody (LeicaMicrosystems, Buffalo Grove Ill., cat #NCL-L-CD4-368) was added in a1:50 dilution, and incubated for 2 hours. Slides were counterstainedwith hematoxylin. Each slide was scanned at a magnification of 10× toidentify four suitable fields that were then scored at 25×magnification. Examples are seen in FIG. 19A-C. The numbers of positivestaining cells were recorded and the average count per four fieldscalculated. Photomicrographs was taken and scored for specimens withvery high cell counts to increase accuracy. Samples were also examinedto see if they contained predominantly perivascular and/or parenchymalinfiltrates. A blind comparison of observation by two individuals wascarried out to ensure uniform interpretation. Data from tumor IHC wereanalyzed in combination with CD3Z MS-qPCR data to determine associationbetween the two data sets. (see Example 19)

Example 18 Statistical Analysis of Differential Methylation in CD3+ TCells for Identification of Cell-Specific DMRs

To identify putative cell specific DMRs, MACS sorted leukocyte DNAmethyation data consisting of un-normalized average beta values from theIllumina HumanMethyation27 microrray were calculated from probeintensities using Illumina GenomeStudio. Locus by locus comparisons ofDNA methyation between the sorted cell types were performed using alinear mixed effects model (controlling for beadchip) in SAS version9.2, thereby generating estimates and p-values for differentialmethyation in CD3+ T cells compared to other cell types. Resultantp-values were adjusted for multiple comparisons using the qValue packagein the software program R project for statistical computing, version2.13 available for downloading from the internet, and q-values of lessthan 0.05 were considered significant. Correlations, F-tests, Wicoxonrank sum and Kruskal-Wallis one-way analysis of variance by ranks testswere carried out in R version 2.11.1 and survival analysis wasperforming using the survival pack in R version 2.11.1.

Example 19 Discovery and Validation of CD3Z Demethylation as a Marker ofCD3+ T Cells

The search for genes containing DMRs specific for CD3+ T cells usingmethods herein revealed candidate CpG sites within the genes encodingseveral components of the T cell receptor (TCR) complex; namely, CD3D,CD3E, CD3G, and CD3Z. Myeloid derived blood cells (granulocytes,neutrophils, monocytes) and B-lymphocytes contained methylated CpG siteswithin CD3D, CD3E, CD3G and CD3Z loci compared with T cells, which weredemethylated. CD3Z was also unmethylated in CD16+ NK cells, but wasmethylated in CD16- NK cells. The promoter regions of the CD3D, CD3E andCD3G genes are CpG sparse compared with CD3Z, which contains a CpGisland that is optimally suited for designing MS-qPCR assays (FIG. 1A).For these reasons the CD3Z locus was analyzed for the development of aCD3+ T cell epigenetic marker. CD3Z is significantly overexpressed(p=0.0001; Palmer, Diehn et al. 2006) and demethylated (q=0.00026) inCD3+ T cells compared with non-T cells. Pyrosequencing of CD3Z showedthe extent of differences in demethylation among immune cell lineages,which approaches complete demethylation in CD3+ T cells and nearlycomplete methylation in other cell lineages (FIG. 20A-B).

Bisulfite converted universal methylated DNA and DNA from purified CD3+Tcells were used to prepare a four point calibration curve to estimateCD3+ T cell numbers in mixtures of cells (FIG. 14B). Total amount of DNAwas held constant at four points. Log Linear PCR kinetics weredemonstrated over a range of CD3+ T cell DNA inputs corresponding to 10to 100000 genomic copies, indicating that the MS-qPCR assay was able todetect a few demethylated cells within a background of many thousands ofmethylated cells.

Whole blood samples from 46 healthy controls and 20 patients with gliomawere then used to compare flow cytometry quantification of CD3+ T cellswith the CD3Z MS-qPCR assay (FIG. 14C). The MS-qPCR measurements wereobserved to correlate highly with conventional flow measurement of Tcells as a fraction of total blood leukocytes (Pearson R=0.93; F testp<2.2×10-16). The uniform regression and close correspondence of the twomethods was true for both glioma patients (labeled “cases”) and thehealthy controls. These data show that the disease process itself andtreatment exposures did not influence the demethylation assay.

The correlation of CD3+ T cells detected by IHC and MS-qPCR was assessedin a set of FFPE samples; the results indicated a significantassociation of IHC score with CD3Z demethylation (Pearson R=0.85; F testp=3.4×10⁻¹¹; FIG. 14D). Most CD3+ TILs were CD8+ and only a few stainedpositively for CD4+ (FIG. 19). Glioma cell lines (A172, T98G) were alsostudied; both expressed Foxp3 copy numbers <0.06% of total input.Analysis of two autopsy brain specimens revealed Foxp3 copy numbers<0.04% of total input. These values show limits of detection of theassay which were observed to be much lower than values observed inpatient blood or tumor samples. These results demonstrate thespecificity of the CD3Z epigenetic assay for detecting CD3+immune cellswithin a background of tumor cells.

Example 20 Determination of T Cells and Tregs Levels in Peripheral Bloodby CD3Z and FOXP3 MS-qPCR Assays in Glioma Cases and Controls

The utility of the epigenetic assays using archived frozen bloodspecimen samples was tested by performing a case control analysis ofCD3Z and FOXP3 demethylation in glioma patients and control subjects tomeasure CD3+ T cell and Treg levels, respectively, in stored peripheralblood specimens from the University of San Francisco Adult Glioma Study(AGS). Results of MS-qPCR assays are summarized in Table 18. The totalinputs of DNA from whole blood from the 94 controls and 71 glioma caseswere not significantly different from each other. In patients with gradeIV glioblastoma multiforme (GBM), peripheral blood CD3+ T cell levelswere observed to be significantly lower (Wilcoxon p=1.7×10-9; FIG. 15A),peripheral blood Treg levels were observed to be significantly lower(Wilcoxon p=5.2×10-11; FIG. 15B) and peripheral blood Treg/CD3+ T cellratios were observed to be moderately lower (Wilcoxon p=0.024; FIG. 15C)compared to healthy controls. In glioma patients and controls subjects,levels of T cells and Tregs were positively correlated (Pearson R=0.61,F test p<2.2×10⁻¹⁶). Use of dexamethasone or chemotherapy was notassociated with T cell measures. The GBM case patients received steroidtreatments prior to blood sampling. In healthy controls, but not gliomapatients, people who had smoked were observed to have higher peripheralblood CD3+ T cell levels than those who had never smoked (Wilcoxonp=0.08, FIG. 16A) and current smokers had significantly higher levels ofperipheral blood Tregs than former smokers (Wilcoxon p=0.01) and neversmokers (Wilcoxon p=0.002; FIG. 16B). Furthermore, the ratio ofTregs/CD3+ T cells was significantly elevated in the peripheral blood ofcurrent smokers compared to former smokers (Wilcoxon p=0.01) and neversmokers (Wilcoxon p=0.03) among healthy controls, and trended towardselevated levels in current smokers compared to former smokers (Wilcoxonp=0.17) and never smokers (Wilcoxon p=0.14; FIG. 16C).

TABLE 18 Summary of MS-qPCR measurements for samples (N = 285) SamplePercent Demethylation, Median (Range) Description CD3Z FOXP3 FOXP3/CD3ZBlood samples 17.6 (2.1-44.4)  0.8 (0.06-3.2) 4.5 (0.9-20.2) (n = 165)Controls 21.7 (4.7-44.4) 1.0 (0.2-3.2) 4.8 (1.0-20.2) (n = 94) NeverSmokers 19.3 (4.7-32.1) 1.0 (0.2-2.5) 4.8 (1.0-11.7) (n = 44) FormerSmokers 22.4 (8.8-43.4) 1.1 (0.2-2.2) 4.4 (1.8-10.5) (n = 42) CurrentSmokers 23.4 (5.7-44.4) 1.6 (0.8-3.2) 7.4 (3.6-20.2) (n = 8) GliomaCases 11.2 (2.1-37.7)  0.5 (0.06-2.5) 4.1 (0.9-14.8) (n = 71) NeverSmokers 11.3 (2.7-37.7)  0.5 (0.06-2.5) 3.8 (1.3-11.5) (n = 31) FormerSmokers 12.7 (3.3-32.8)  0.5 (0.06-1.7) 4.1 (0.9-12.8) (n = 29) CurrentSmokers  9.6 (2.1-27.8) 0.5 (0.1-1.2) 5.1 (2.3-14.8) (n = 11) Non-GBM18.5 (3.5-26.6) 0.9 (0.2-1.6) 6.0 (3.8-7.1)  (n = 6) GBM 10.5 (2.1-37.7) 0.5 (0.06-2.5) 4.1 (0.9-14.8) (n = 65) Excised Tumors  0.5 (0.03-18.7)0.03 (0-1.5)   5.1 (0-100)  (n = 120) Grades I, II & III  0.3 (0.03-3.9)0.02 (0-0.5)   3.4 (0-100)  (n = 83) Pilocytic 1.4 (1.0-1.9) 0 (0-0)  0(0-0)   Astrocytoma (n = 2) Ependymoma  0.5 (0.09-3.0) 0.03 (0-0.3)  3.4 (0-29.4)  (n = 15) Oligodendrogli-  0.2 (0.04-1.6) 0 (0-0.2) 0(0-57.3) oma (n = 20) Oligoastrocytoma 0.25 (0.04-3.9) 0.05 (0-0.4)  10.5 (0-100)    (n = 19) Astrocytoma  0.3 (0.03-2.0) 0 (0-0.5) 0(0-100)  (n = 27) Grade IV, GBM  1.1 (0.17-18.7) 0.08 (0-1.5)   7.8(0-47.4)  (n = 37)

Example 21 Determination of T Cells and Tregs Levels in TumorInfiltrates by CD3Z and FOXP3 MS-qPCR Assays in Excised Glioma Tumors

The demethylation assays of CD3Z and FOXP3 were used to measure levelsof tumor infiltrating CD3+ T cells and Tregs, respectively, in 120 freshfrozen glioma tumors from the UCSF Brain Tumor Research Center tissuebank. Results of MS-qPCR assays are summarized in Table 18. Increasedglioma tumor grade and higher levels of both CD3+ T cell (Wilcoxonp=5.7×10-7; FIG. 17A) and Treg (Wilcoxon p=0.00014; FIG. 17B) in tumorinfiltrates were observed to be significantly associated. In grade IVglioma tumor tissues the median level of Treg percentage of T cells wasobserved to be higher than that of control blood samples (Table 18), andhigher than that of lower grade tumors (FIG. 17C). Data from MS-qPCRshowed significant differences among glioma tumor histologies in levelsof CD3+ T cells (Kruskal-Wallis p=8.6×10-7; FIG. 21A), Tregs(Kruskal-Wallis p=0.00011; FIG. 21B) and Treg/CD3+ T cell ratios(Kruskal-Wallis p=0.018; FIG. 21C). Poorer patient survival wasassociated with and higher levels of tumor infiltrating CD3+ T cells(Log-Rank p-value=0.014; FIG. 22A) and Tregs (Log-Rank p-value=0.039;FIG. 22B) measured by MS-qPCR.

Example 22 Kaplan-Meier Survival Curves for Glioma Cases ShowAssociation of Lower Treg with Improved Survival

Survival of glioma patients were correlated with the incidence of CD3+ Tcells and Tregs as measured by CD3Z demethylation assays. (FIG. 22A-C).Both univariate and multivariate survival analyses were performed.Kaplan-Meier survival curves for glioma cases were stratified by medianvalues of CD3Z demethylation assays. For depicting the survival resultsin FIG. 22A-C, patients were divided into two groups. In each panel thetop trace represents survival data of the group of patients for whom themeasured variable (methylation status of CD3+ T cells, or of Tregs, or aratio Tregs/T cells) was below the median observed for that variable,and the bottom trace represents survival data of the group of patientsfor whom the measured variable was above the median observed for thatvariable.

The results show that after controlling for age, gender and grade theCD3Z demethylation assays for CD3+ and CD3+ Tregs in glioma tumor tissuewere significantly associated (FIG. 22A-C) with poorer patient survival.

A CD3+ T cell CD3Z demethylation assay was performed which showed thatlower CD3+ T cell/total input in glioma tumor tissue was significantlyassociated (FIG. 22A) with improved survival (Log-Rank p-value=0.0144).A Treg CS-DM CD3Z demethylation assays was performed which showed (FIG.22B) that lower Treg/total input in glioma tumor tissue wassignificantly associated with improved survival (Log-Rankp-value=0.0385). A measurement of Treg/CD3+ T cell ratio was performedby CD3Z demethylation assay which showed (FIG. 22C) that lower Tregpercentage of CD3+ T cells in glioma tumor tissue was significantlyassociated with improved survival (Log-Rank p-value=0.4558).

Example 23 Cells, and Cancer Patient and Control Datasets forDetermining DNA Methylation Based Epigenetic Signatures forDifferentiating Patients and Controls

Sorted, normal, human peripheral blood leukocyte subtypes were isolatedfrom whole blood by magnetic activated cell sorting (MACS) (AllCellsLLC, Emeryville, CA). The purity of separated cells was confirmed withflow cytometry to be >97%. Genomic DNA was extracted and purified fromcell pellets using a commercially available method (Qiagen, Valencia,Calif.), treated with sodium bisulfite (Zymo Research, Irvine, Calif.)and subjected to methylation profiling using the InfiniumHumanMethyation27 BeadArray (Illumina, San Diego, Calif.). This sameplatform was used for the analysis of samples from the case-controlstudies described below.

The HNSCC data set consists (Table 19) of 92 incident cases from thegreater Boston area and 92 cancer-free population-based control subjectsfrom the same region (Applebaum K M et al., Int J Cancer 124:2690-2696,2009). The clinical characteristics for this study population arecontained in Table 19. The ovarian cancer data set (Teschendorff A E etal., 2009, PLoS One 4:e8274, 2009) is publicly available from GeneExpression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/, Accessionnumber GSE19711), and consists of 266 postmenopausal women diagnosedwith primary epithelial ovarian cancer (131 pre-treatment and 135post-treatment cases) from the UK Ovarian Cancer Population Study(UKOPS). Controls (n=274) were cancer-free postmenopausal women forwhich annual serum samples were available. To avoid potential biases dueto therapy, only pre-treatment ovarian cases were included in theanalysis. The bladder cancer data set (Marsit C J et al., 2011, J ClinOncol 29:1133-1139) consists of 223 incident bladder cancer casesidentified from the New Hampshire state cancer registry and 237population controls from the same region (Karagas M R et al., 1998,Environ Health Perspect 106:1047-1050; Wallace K et al., 2009, CancerPrey Res 2:70-73). Table 20 provides a summary of the participantcharacteristics.

TABLE 19 Characteristics of the study population in the HNSCC data set.Characteristics Cases (n = 92) Controls (n = 92) Age, median years(range) 58 (31-84)  59 (32-86)  Gender, n (%) Male 64 (69.6%) 64 (69.6%)Female 28 (30.4%) 28 (30.4%) Smoking history, n (%) Never 17 (18.5%) 32(34.8%) Former 59 (64.1%) 47 (51.1%) Current 16 (17.4%) 13 (14.1%)Pack-years*, median (range)  40.0 (0.8-135.0) 24.5 (0.5-85.0) Alcoholhistory, median 15.7 (0-307.0)   5.6 (0-140.6) drinks/week (range) HPV16(E6, E7 or L1 seropositivity), n (%) Negative 66 (71.7%) 83 (90.2%)Positive 26 (28.3%) 9 (9.8%) Tumor Site, n (%) Oral cavity 39 (42.4%) —Pharynx 35 (38.0%) — Larynx 18 (19.6%) — Stage, n (%) I  9 (12.5%) — II 9 (12.5%) — III 14 (19.4%) — IV 40 (55.6%) — *Restricted toever-smokers (current + former)

TABLE 20 Characteristics of the study population in the Bladder cancerdata set. Controls Cases Characteristics No. % No. % Total No. 237 223Age, years Median 65 66 Range 28-74 25-74 Sex Male 158 48 171 52 Female79 60 52 40 Family history of bladder cancer* No 224 53 199 47 Yes 7 449 56 Smoking history Never 72 64 40 36 Former 126 53 111 47 Current 3935 72 66 Tumor stage/grade designation Carcinoma in situ NA 6 3Noninvasive low grade NA 140 63 (grade 1-2) Noninvasive high grade NA 177 (grade 3) Invasive NA 60 27 *Data on family history were not availablefor 13 subjects

Example 24 Statistical Analysis of Differences in Methylation Status inLeucocyte Subsets for Determining Signatures Based on Leukocyte DMRs

The analytic strategy was aimed toward examining the extent to whichperipheral blood DNA methylation of non-hematopoietic cancers is drivenby the epigenetic signatures that define leukocyte subtypes. Linearmixed-effects models were used to assess differences in methylationacross the leukocyte subtypes and controlled for the large number ofcomparisons using false discovery rate (fdr) estimation. Leukocyte DMRswere subsequently ranked based on their strength of association and thehighest ranking 50 DMRs were examined across the three cancer data setsbetween cancer cases and cancer-free controls.

An analysis was performed that capitalized on the aggregate methylationsignatures across a collection of leukocyte DMRs. Each one of the fullcancer data sets was split into equally sized training and testing sets.Samples in the training sets were then clustered using leukocyte DMRs.Clustering analysis was achieved using the Recursively PartitionedMixture Model20 (RPMM), a hierarchical model-based method for clusteringused for the clustering of array-based methylation data ((Christensen BC et al., 2009, PLoS Genet. 5:e1000602; Christensen B C et al., 2011, JNatl Cancer Inst 103:143-453; Hinoue T et al., 2012, Genome Res.22(2):271-82; Koestler D C et al., 2010, Bioinformatics 26:2578-2585).Based on the RPMM fit to the training sets, methylation class membershipfor the observations in the respective testing sets was predicted andthe association between predicted methylation class and cancercase/control status were assessed.

The detailed statistical methodologies employed in the analysis areshown in Examples 25-26. Analyses were carried out using the Rstatistical package, R project for statistical computing, version 2.13 Ravailable for downloading from the internet.

Example 25 Prediction of Methylation Class Membership Based onEpigenetic Signatures from Leukocyte Derived DMRs

Genome-wide DNA methylation was profiled in 46 samples of magneticantibody sorted, normal human peripheral blood leukocyte subtypes(including B cells, granulocytes, monocytes, NK-cells, CD4+ T cells,CD8+ T cells, and Pan-T cells; FIG. 28) using the InfiniumHumanMethylation27 BeadArray. To discern leukocyte subtype DMRs, anassociation between methylation and leukocyte subtype for each of 26,486autosomal CpG loci was examined. This data revealed 10,370 significantlydifferentially methylated CpGs among the leukocyte subtypes (fdrq-value<0.05), which were ranked by q-value (Table 22 and FIG. 24A). Thehighest ranking 50 DMRs (Table 21) from this ranked list were selectedfor use in the case-control analyses. Since the publically availableovarian cancer data set included both pre- and post-treatment cases,only pre-treatment cases (n=131) were considered in subsequent analysesto avoid potential biases resulting from therapy. Using unconditionallogistic regression models, adjusted for available and relevantconfounders (FIG. 24A), a substantial proportion of the 50 selectedleukocyte DMRs were found to be significantly differentially methylatedbetween cancer cases and cancer-free controls at the α=0.05 threshold(48, 47, and 8 out of 50, permutation p-values=<0.001, <0.001, 0.085,for HNSCC, ovarian cancer, and bladder cancer, respectively; FIG. 24B).

Eight of the leukocyte DMRs that were significantly differentiallymethylated in cancer cases compared to controls were observed to becommon to the three cancer types (FIG. 24B). In HNSCC and ovariancancer, seven of these eight leukocyte DMRs were hypomethylated in casesrelative to controls, whereas the 8 DMRs were hypermethylated in bladdercancer cases relative to controls (Table 22).

To extend on the aggregate methylation signatures across a collection ofleukocyte DMRs, classifiers based on profiles of leukocyte DMRs obtainedfrom the subset analysis were developed and tested and the performanceof these classifiers for successfully discriminating cancer cases fromcancer-free controls was assessed. The workflow of the DMR methylationprofile analysis is shown in FIGS. 29-31. For each of the three cancerdata sets, a cross-validation procedure (Christensen B C et al., 2011, JNatl Cancer Inst 103:143-153) was implemented on the training sets onlyto determine the number of highest ranking leukocyte DMRs (M) forsubsequent clustering analysis of the training sets. The highest ranking50, 10, and 56 leukocyte DMRs from the respective cross-validationprocedures using the 10,370 putative DMRs initially identified wereselected to cluster the observations in the HNSCC, ovarian cancer, andbladder cancer training sets respectively. The resultant clusteringsolutions were used to predict methylation class membership for thesubjects within the respective independent testing sets. FIG. 24A, FIG.25A and FIG. 26A depict heat maps of the respective testing sets bypredicted methylation class for each cancer data set. Methylationclasses derived from leukocyte subtype DMRs were significantlyassociated with cancer case status within each cancer type (permutationχ² p-values <0.0001, <0.0001, 0.03, HNSCC, ovarian cancer, and bladdercancer data sets respectively), supporting the phenotypic relevance ofpredicted methylation classes based on leukocyte DMRs.

For the HNSCC testing set, subjects predicted to be in the right mostclasses of the dendrogram (classes beginning with R) were six-fold morelikely to be HNSCC cases compared to subjects in the left most classes(classes beginning with L) (OR=5.99; 95% CI [1.96, 18.36]), controllingfor age, gender, smoking, alcohol consumption, and HPV serostatus.Assessing the clinical utility of the predicted methylation classes inHNSCC demonstrated that methylation classes derived from the highestranking 50 leukocyte DMRs were highly predictive of HNSCC case/controlstatus (area under the curve (AUC)=0.82 95% CI [0.74, 0.91]), whichincreased to 0.92 (0.87, 0.98 with age, gender, smoking, alcoholconsumption, and HPV serostatus included in the model (FIG. 24B).

For ovarian cancer, subjects predicted to be in the right most classeswere approximately ten-fold more likely to be ovarian cancer casescompared to subjects in the left most classes (OR=9.87, 95% CI [4.63,21.10]), controlling for age. Additionally, the predicted methylationclasses in the ovarian cancer data demonstrated remarkably highsensitivity and specificity for predicting ovarian cancer case/controlstatus (AUC=0.83 95% CI [0.77, 0.89]), which increased to AUC=0.86 95%CI [0.81, 0.92] with age included in the model (FIG. 25B).

In the bladder cancer data, subjects in the right most classes werenearly twice as likely to be bladder cancer cases compared to subjectsin the left most (OR=1.94 95% CI [0.95, 3.98], adjusted for age, gender,smoking and family history of bladder cancer). The clinical utility ofthe predicted methylation classes in the bladder cancer data was lowerthan that observed for HNSCC and ovarian cancer (bladder AUC=0.67 95% CI[0.60, 0.73] and adjusted AUC=0.77 95% CI [0.71, 0.83] with age, gender,smoking, and family history in the model) (FIG. 26B).

Utilizing leukocyte-derived DMRs to differentiate cases and controlsresulted in methylation profiles that were consistent, and in the caseof HNSCC and ovarian tumors, considerably better in terms of theirprediction performance compared to previously published results usingthe same data sets (Teschendorff A E et al., 2009, PLoS One 4:e8274;Marsit C J et al., 2011, J Clin Oncol 29:1133-1139; Langevin S M et al.,Epigenetics. 2012 March; 7(3):291-9). For the HNSCC and ovarian datasets there was a high degree of correlation in the methylation status ofleukocyte DMRs and CpG loci identified by previous analytic strategies(Langevin S M et al., Epigenetics. 2012 March; 7(3):291-9; mean absolutespearman correlations=0.68 and 0.75, respectively; FIG. 27A and FIG.27B). In contrast, the highest ranking 56 DMRs in the bladder data setwere found to be less correlated with the CpG loci used to form themethylation classes in a previous study using the same data set (meanabsolute spearman correlation=0.11; FIG. 27C).

TABLE 21 The highest ranking 50 differentially methylated regions (DMRs)among the leukocyte subtypes (false discovery rate q-values < 0.001 forall) CpG Name Chromosome Gene Name F-statistic cg03801286 21 KCNE1373.63 cg25634666 11 FOLR3 369.50 cg24777950 14 CTSG 350.66 cg1735673321 IFNGR2 291.97 cg02497428 16 IGSF6 291.35 cg24211388 6 AIF1 285.92cg03330678 17 9-Sep 284.79 cg00546897 21 LOC284837 279.64 cg24841244 11CD3D 271.62 cg11283860 1 SLC45A1 271.09 cg27485921 2 ATP6V1E2 267.19cg00974864 1 FCGR3B 260.62 cg07730301 11 ALDH3B1 252.52 cg07728874 11CD3D 250.67 cg17496921 19 TSPAN16 246.58 cg26661623 17 ASGR2 242.83cg18920397 1 LY9 238.64 cg27461196 19 FXYD1 236.64 cg20720686 7 POR232.23 cg09303642 12 NFE2 231.34 cg23140706 12 NFE2 224.95 cg08458487 10SFTPD 217.67 cg20748065 7 POR 217.63 cg18589858 11 SLCO2B1 217.14cg10287137 11 P2RY2 215.31 cg25587233 9 PPP2R4 207.25 cg08044694 19 BRD4202.50 cg18084554 19 ARID3A 198.61 cg13650156 7 PILRA 197.87 cg188546662 SLC11A1 197.42 cg17173423 11 MS4A3 195.50 cg22242539 17 SERPINF1194.11 cg02780988 17 KRTHA6 193.25 cg10266490 1 ACOT11 192.62 cg276063415 FYB 191.23 cg15512851 6 FGD2 185.34 cg20070090 1 S100A8 183.43cg11058932 7 TSGA13 183.31 cg13500819 5 PACAP 182.82 cg15880738 11 CD3G182.73 cg07285167 1 CSF3R 182.16 cg09868035 20 C20orf135 179.56cg01980222 6 TREM2 178.94 cg21019522 11 SLC22A18 176.20 cg16097772 12LYZ 172.89 cg21969640 12 GPR84 172.51 cg12971694 9 CD72 172.43cg22224704 11 GSTP1 172.40 cg07239938 19 ELA2 170.70 cg02240622 15 PLCB2169.99

TABLE 22 Methylation differences between cancer cases and controls forthe eight overlapping differentially methylated leukocyte DMRs. Meandelta-beta refers to the difference in mean methylation between cancercases and controls (i.e. βcases − βcontrols). Mean delta-beta (95% CI)Gene Locus HNSCC Ovarian Bladder C20orf135 −0.05 (−0.07, −0.03) −0.06(−0.08, −0.05) 0.02 (0.0, 0.04)  PACAP 0.02 (0.00, 0.04)  0.04 (0.02,0.05)  0.02 (0.0, 0.04)  FGD2 −0.05 (−0.07, −0.03) −0.06 (−0.07, −0.04)0.02 (0.01, 0.04) SLC22A18 −0.05 (−0.07, −0.04) −0.05 (−0.06, −0.04)0.02 (0.01, 0.04) GSTP1 −0.05 (−0.07, −0.04) −0.06 (−0.07, −0.05) 0.02(0.01, 0.04) NFE2 −0.04 (−0.05, −0.03) −0.04 (−0.05, −0.03) 0.02 (0.0,0.03)  ASGR2 −0.06 (−0.08, −0.04) −0.05 (−0.07, −0.04) 0.02 (0.01, 0.04)SLC11A1 −0.05 (−0.07, −0.04) −0.05 (−0.04, −0.06) 0.02 (0.0, 0.04) 

Example 26 Statistical Analysis of Methylation Differences in LeukocyteDMRs Between Cancer Cases and Cancer-Free Controls for DeterminingEpigenetic Signatures Specific to Each Group

Linear mixed-effects models were used to assess differences inmethylation across the leukocyte subtypes, modeling arcsine square-roottransformed methylation as the response1, leukocyte subtype as a fixedeffect covariate, and a random effect term for plate/BeadChip. Falsediscovery rate (fdr) estimation was used to control for the large numberof comparisons and putative leukocyte DMRs were defined as those withfdr q-value<0.05. Leukocyte DMRs were then ranked based on theirstrength of association using the F-statistics that resulted from therespective linear mixed-effects models.

Methylation differences among the highest ranking 50 leukocyte DMRs wereexamined between cancer cases and cancer-free controls using a series ofunconditional logistic regression models that were adjusted usingavailable and relevant covariate information. A leukocyte DMR wasconsidered differentially methylated if the nominal p-value from theunconditional logistic regression model was less than 0.05. Permutationtests were then applied to each of the three data sets to determine ifthe number of differentially methylated leukocyte DMRs was significantlygreater than expected by chance. Specifically, samples were randomlypermuted (same permutation across the highest ranking 50 DMRs) and anunconditional logistic regression model was fit to the resampled data.For each data set 1000 permutations were considered to generate a nulldistribution of the number of differentially methylated leukocyte DMRs.Permutation p-values were then obtained by comparing the observed numberof differentially methylated leukocyte DMRs to the respective nulldistribution.

The leukocyte DMR profile analysis involved splitting the full cancerdata sets into equally sized training and testing sets (FIGS. 29-32).Samples in the training set were clustered using the highest ranking Mleukocyte DMRs, where M was determined from the total pool of putativeDMRs using the previously described cross-validation procedure (Sincic Nand Herceg Z, 2011, Curr Opin Oncol 23:69-76). Clustering analysis wasachieved using the Recursively Partitioned Mixture Model3 (RPMM), ahierarchical model-based method for clustering that has been extensivelyused for the clustering of array-based methylation data (Cui H M, 2007,Dis Markers 23:105-112; Wilhelm-Benartzi C S et al., 2010,Carcinogenesis 31:1972-1976; Schwartzman J et al., 2011, Epigenetics6:1248-1256, 2011). Based on the RPMM fit to the training data, a naiveBayes classifier was used to predict methylation class membership forthe observations in the independent testing set. Associations betweenpredicted methylation class and cancer case/control status were assessedusing permutation χ² tests and unconditional logistic regression modelsadjusted for available and relevant confounders. The clinical utility ofthe identified methylation classes were investigated using receiveroperating characteristic (ROC) curves and the corresponding area underthe curve (AUC).

Pairwise spearman correlation coefficients were computed between thehighest ranking M leukocyte DMRs and the CpG loci identified from thecorresponding semi-supervised RPMM2 (SS-RPMM) analysis of the HNSCC,ovarian, and bladder cancer data sets. A diagram illustrating theanalytic framework for SS-RPMM is provided in FIG. 32. Briefly SS-RPMMis a statistical methodology for identifying classes of methylation thatare associated with a phenotype of interest and has been successfullyapplied in several of settings (Christensen B C et al. 2009, Cancer Res69:227-234; Marsit C J et al., 2006, Cancer Res 66:10621-10629, 2006).

The same training and testing sets were used for the HNSCC and bladdercancer data sets as were used in the references Langevin S M et al.,Epigenetics. 2012 March; 7(3):291-9 and Christensen B C et al., 2009,Cancer Res 69:227-234, to compare the results of the present analysis topreviously published results, and to provide additional insight withrespect to the findings of those studies. The ovarian cancer data setwas also analyzed using SS-RPMM strategy described in Langevin S M etal., Epigenetics. 2012 March; 7(3):291-9 and Christensen B C et al.,2009, Cancer Res 69:227-234, and the results are shown in FIG. 33.Following the logic above, the training sets used for the SS-RPMManalysis were applied to the leukocyte DMR profile analysis of theovarian data.

Analyses were carried out using the R statistical package, R project forstatistical computing, version 2.13 R available for downloading from theinternet.

Example 27 Methylation Analysis by DNA Methylation Microarray for NKCell Specific DMR

Normal human peripheral blood leukocytes were isolated by magneticactivated cell sorting (MACS; Miltenyi Biotec Inc., Auburn, Calif.) andpurity was confirmed by fluorescence activated cell sorting (FACS). Themajor cell types obtained included NK cells (n=9), B cells (n=5), Tcells (n=16), monocytes (n=5), and granulocytes (n=8). DNA and RNA wereco-extracted from MACS sorted leukocytes using AllPrep DNA/RNA mini kit(Qiagen Inc., Valencia, Calif.). DNA from archived blood was extractedwith DNeasy Blood & Tissue kit (Qiagen Inc., Valencia, Calif.). DNA wastreated with sodium bisulfite according to the EZ DNA Methylation Kit(Zymo Research Corporation, Irvine, Calif.).

Methylation analysis was performed using The Infinium®HumanMethylation27 Beadchip Microarray (Illumina Inc., San Diego,Calif.), which quantifies the methylation status of 27,578 CpG loci from14,495 genes, with a redundancy of 15-18 fold. The ratio of fluorescentsignals was computed from both alleles using the following equation:β=(max(M,0))/(|U|+|M|)+100. The resultant β-value is a continuousvariable ranging from 0 (unmethylated) to 1 (completely methylated) thatrepresents the methylation at each CpG site and is used in subsequentstatistical analyses. Data were assembled with the methylation module ofGenomeStudio software (Illumina, Inc., San Diego, Calif.; Bibikova M etal., 2009, Epigenomics 2009; 1:177-200)

Example 28 Validation of DNA Methylation Microarray Results forIdentifying NK Cell-Specific DMRs by Pyrosequencing

Pyrosequencing assays to validate microarray results were designed usingPyrornark Assay Design 2.0 (Qiagen Inc., Valencia, Calif.), and carriedout on a Pyromark MD pyrosequencer running Pyromark qCpG 1.1.11 software(Qiagen Inc., Valencia, Calif.). Oligonucleotide primers were obtainedfrom Life Technologies™ (Grand Island, N.Y.).

Example 29 Protein Expression Analysis by mRNA Expression Array forIdentifying NK Cell-Specific DMRs

The Whole-Genome DASL HT Assay Kit (Illumina Inc., San Diego, Calif.)was used to obtain simultaneous profiles of more than 29,000 mRNAtranscripts. Data were assembled with the expression module ofGenomeStudio software (Illumina Inc., San Diego, Calif.). The mRNAexpression array data was used in combination with DNA methylation arraydata to identify NK cell-specific DNA methylation.

Example 30 Methylation Specific Quantitative Polymerase Chain Reaction(MS-qPCR) Analysis for Quantification of NKp46 Demethylation

Primers and TaqMan major groove binding (MGB) probes (Table 23) with 5′6-FAM (6-Carboxyfluorescein) and 3′ non-fluorescent quencher (NFQ) aswell as TaqMan® 1000 RXN Gold with Buffer A Pack were obtained from LifeTechnologies™ (Grand Island, N.Y.). MS-qPCR was performed usingsolutions and conditions according to Campan M et al., 2009, Methods MolBiol, 507:325-37 with the following modifications. A solution of 10×TaqMan® Stabilizer containing 0.1% Tween-20, 0.5% gelatin was preparedweekly. Each reaction of 20 μl contained 5 μl DNA, 11.9 μl preMix, 3 μloligoMix, and 0.1 μl Taq DNA polymerase. Cycling was performed using a7900HT Fast Real-Time PCR System (Applied Biosystems, Foster City,Calif.); 50 cycles at 95° C. for 15 sec and 60° C. for 1 min after 10min at 95° C. preheat. Samples were run in triplicate using the absolutequantification method.

TABLE 23 MS-qPCR oligonucleotide sequences Oligonucleotide name SequenceNKp46 forward ATTAGGTTGGTAGAATTTGAGT primer (SEQ ID NO: 116)NKp46 reverse  CCCATTCCCCTTCCACA (SEQ ID NO: primer 117) NKp46 probe(6FAM)CTCACCAACACAAAACAA(MGB,   NFQ) (SEQ ID NO: 118) C-less forward  TTGTATGTATGTGAGTGTGGGAGAGA primer (SEQ ID NO: 97) C-less reverseTTTCTTCCACCCCTTCTCTTCC primer (SEQ ID NO: 98) C-less probe(6FAM)CTCCCCCTCTAACTCTAT(MGB,  NFQ) (SEQ ID NO: 99) MGB: major groovebinding FAM: 6-Carboxyfluoresee in NGQ: NFQ C-less qPCR assay: Campan Met al., 2009, Methods Mol Biol, 507: 325-37; Weisenberger D J et al.,2008, Nucleic Acids Res 2008: 36: 4689-98

Quantification of total bisulfite converted DNA copies was performed byreference to the C-less qPCR assay (Campan M et al., 2009, Methods MolBiol, 507:325-37; Weisenberger D J et al., 2008, Nucleic Acids Res 2008;36:4689-98). C-less primers and probes recognize a DNA sequence withoutcytosines; hence, the assay amplifies the total amount of DNA in a PCRreaction regardless of bisulfite conversion or methylation status. Aconversion factor was used for a diploid human cell, which is 6.6picograms (pg) of DNA (3.3 pg per copy) to calculate copy number.

Normal human blood DNA quantified by UV absorption (Nanodrop, Inc) wasused to generate a four point standard curve with 30,000 copies, 3,000copies, 300 copies and 30 copies of genomic DNA. This standard curve wasincluded on each sample plate to obtain quantification of DNA from Ctvalues. Since C-less primers hybridize to both strands of the standardDNA (non-bisulfite converted) and since bisulfite converted sampleshybridize to a single strand during the first cycle, the resultant copynumber obtained from bisulfite treated samples was multiplied by two.Bisulfite converted, universal methylated DNA standard (Zymo ResearchCorperation, Valencia, Calif.) and bisulfite converted, isolated NK cellDNA were quantified at the same time using the C-less assay. Resultantcopy number measurements were used to prepare a calibration curve forthe NKp46 demethylation assay. NK cell DNA in known copy numbers wasspiked into universal methylated DNA in ratios that maintained aconstant total number of DNA copies (10,000 copies) in each reactionacross the dilution scheme. This mimics conditions for detectingdifferent relative numbers of NK cells within a complex mixture of cellsin a biological sample. For absolute quantification of NKp46demethylation, the four-point standard curve used 10,000 copies, 1,000copies, 100 copies, and 10 copies of bisulfite converted NK cell DNA.

Example 31 Statistical Modeling of the DNA Methylation Microarray Datafor Estimation of Differential Methylation

A linear mixed effects model was applied to the Illumina Infinium®HumanMethylation27 data using SAS (SAS Institute Inc., Cary, N.C.). Celltype was designated as the fixed effect and beadchip plate was therandom effect. For this example, the fixed effect groups were NK cellsand non-NK cells, which included pan T lymphocytes, CD4+ T-lymphocytes,Tregs, CD8+ T-lymphocytes, B-lymphocytes, granulocytes and monocytes.Coefficients were generated that estimated differential methylation weregenerated such that, for any particular locus, a negative coefficientindicated less methylation in NK cells than in the other cell types.Resultant p-values were adjusted for multiple comparisons using the“qvalue” package in the software, the R project for statisticalcomputing available for downloading from the internet.

Example 32 Statistical Modeling of the RNA Expression Array forEstimation of Differential RNA Expression

Linear models were applied to the Illumina Whole-Genome DASL HT usingthe “limma” package in the software, the R project for statisticalcomputing. RNA expression for MACS isolated NK cells was compared toeach of the following MACS isolated leukocytes: pan T-lymphocytes, CD4+T-lymphocytes, Tregs, CD8+ T-lymphocytes, B lymphocytes, ganulocytes andmonocytes. Thus, estimates were obtained for log-fold changes in RNAexpression between NK cells and each of the aforementioned cell types,in which a positive value indicated higher RNA expression in NK cellscompared to a particular cell type. Resultant p values were adjusted formultiple comparisons using the “qvalue” package in R project forstatistical computing. NK cell specific differential RNA expression wasconsidered significant only if the seven q-values were each less than0.1.

Example 33 Statistical Analysis of the (MS-qPCR) Data

Statistical analyses were carried out in R project for statisticalcomputing. A generalized linear model analysis and F-test were performedto determine log linear PCR kinetics for the NK cell standard curve. Totest for univariate associations between continuous NKp46 demethylationmeasurements and discrete variables, Wilcoxon rank sum tests (fordichotomous variables, such as case status) and Kruskal-Wallis one-wayanalysis of variance tests were employed. To test for univariateassociations between continuous NKp46 demethylation and other continuousvariables linear regression analysis, calculation of Pearsonproduct-moment correlations and F-tests were performed. A chi-squaredtest for trends in proportions was applied to identify trends in HNSCCprevalence by control-determined demethylation tertiles. Multivariatelogistic regression analyses were performed using the “glm” functionwith family set to binary.

Example 34 NKp46 Demethylation is a Biomarker of NK Cells

Analysis of DNA methylation and RNA expression microarray data from MACSisolated (FACS validated) normal human leukocytes were integrated toidentify putative, NK cell-specific DMRs that could potentially serve asreliable biomarkers of the cell type. The list of candidate gene regionswas narrowed to CpG loci that were significantly demethylated in NKcells (q<0.1, coefficient<0) and that were located within genes whoseRNA expression was significantly elevated in NK cells (q<0.1, logfold-change>1). These candidates are marked as darkened asterisks in thetop left quadrant of FIG. 34. Pyrosequencing and MS-qPCR of bisulfiteconverted DNA from the MACS isolated leukocytes confirmed that a regionnear the promoter of NKp46 is demethylated in NK cells, and ismethylated in T cells, B cells, granulocytes, and monocytes (FIGS. 35and 38). Furthermore, the CD56^(dim) subset of NK cells showed completedemethylation in the NKp46 region, whereas CD56^(bright) NK cellsexhibited only partial demethylation in the region as measured byMS-qPCR. The NKp46 MS-qPCR assay was optimized to fit a log-linearrelationship between lower Ct values (more demethylated copies of NKp46)and increased NK cell DNA content (Pearson R=−0.996, p<2.2×10¹⁶; FIG.36).

Example 35 Samples from HNSCC Patients have Diminished Circulating NKCells

The calibrated NKp46 MS-qPCR assay was used to measure the level ofcirculating NK cells in the peripheral blood of patients with HNSCC andcancer free controls. The demographics of the study population are shownin Table 24.

Univariate analysis revealed that significantly fewer demethylatedcopies of NKp46 were detected in HNSCC blood than in control blood(p<0.0001, FIG. 39), indicative of a diminished NK cell compartment inthe peripheral blood of HNSCC patients. There was no significantunivariate association observed between the measured number ofdemethylated NKp46 copies and age, gender, HPV16 (E6 and/or E7)serology, cigarette smoking, alcohol consumption, or body mass index.There was no significant difference in the number of demethylated NKp46copies detected in patients with oral, pharyngeal, and laryngeal tumors.

To determine whether the observed association between NK cells and casestatus was attributable to systemic chemotherapy or other treatments,the number of demethylated NKp46 copies detected in case blood samplesdrawn within one month of diagnosis was compared to those drawn morethan one month after diagnosis, and no significant difference wasobserved.

TABLE 24 Demographic characteristics Total Controls HNSCC OralPharyngeal Laryngeal Characteristic (N = 244) (n = 122) (n = 122) (n =43) (n = 53) (n = 26) Age Mean (SD) 61 (12)  62 (12)  61 (12)  60 (15) 60 (10)  64 (9.5)  Median (Range)  60 (29-87)  60 (31-87)  60 (29-86) 59 (29-86)  60 (41-86)  64 (50-83) Gender Male, No.(%) 178 (73%)  89(73%) 89 (73%) 27 (63%) 41 (77%) 21 (81%) Female, No.(%) 66 (27%) 33(27%) 33 (27%) 16 (37%) 12 (23%)  5 (19%) HPV 16 Serology L1+, No.(%) 33(14%) 4 (3%) 29 (24%)  6 (14%) 22 (42%) 1 (4%) E6+, No.(%) 41 (17%) 4(3%) 37 (30%) 2 (5%) 32 (60%)  3 (12%) E7+, No.(%) 28 (11%) 2 (2%) 26(21%) 1 (2%) 23 (43%) 2 (8%) E6+ and E7+, No.(%) 25 (10%) 0 (0%) 25(20%) 0 (0%) 23 (43%) 2 (8%) E6+ or E7+, No.(%) 44 (18%) 6 (5%) 38 (31%)3 (7%) 32 (60%)  3 (12%) Cigarette Smoking Status Never, No.(%) 65 (27%) 41(34%) 24 (20%) 11 (26%) 11 (21%) 2 (8%) Former, No.(%) 149 (61%)  66(54%) 83 (68%) 29 (67%) 35 (66%)  19(73%) Current, No.(%) 30 (12%) 15(12%) 15 (12%) 3 (7%)  7 (13%)  5 (19%) Cigarette Pack-Years Mean (SD)26 (29)  17 (23)  35 (32)  26 (27)  36 (35)  45 (30)  Median (Range)  16(0-116)  7 (0-114)  31 (0-116)  20 (0-105)  33 (0-116) 45 (0-96) AlcoholDrinks per Week Mean (SD) 18 (26)  15 (27)  21 (24)  18 (23)  22 (25) 23 (25)  Median (Range)  7 (0-199)  6 (0-199)  14 (0-155)  7 (0-90)  18(0-155)  19 (0-113)

The NKp46 MS-qPCR measurements from cancer-free control blood sampleswere used to determine suitable cutoffs for NKp46 demethylationtertiles. The proportion of total HNSCC cases decreased significantlywith increasing demethylation tertile (p>0.001, FIG. 37), indicatingthat HNSCC patients are more likely to have depressed levels of NK cellsin their peripheral blood. The trend held true independent of the casestratification by HPV16 (E6 and/or E7) serology, or time of blooddrawing within a month of diagnosis or earlier. Multivariate logisticregression controlling for age, gender, cigarette smoking, alcoholconsumption, BMI, and HPV16 (E6 and/or E7) serology confirmed increasedHNSCC risk for individuals in the lower two normal NKp46 demethylationtertiles (Table 25), strongly indicating that lower levels of NK cellsin the peripheral blood are significantly associated with HNSCC.

TABLE 25 Logistic regression of HNSCC risk NKp46 demethylation CrudeAdjusted* tertile OR (95% CI) p-value OR (95% CI) p-value 1st (lowest)4.3 (2.2, 9.0) 5.0 × 10⁻⁵ 5.6 (2.0, 17.4) 0.002 2nd (middle) 2.8 (1.4,6.0) 0.006 4.9 (1.8, 16.1) 0.004 3rd (highest) Reference Reference*Unconditional multivariate model controlling for age, gender, smoking,drinking, BMI and HPV16 (E6 and/or E7) serology

Example 36 Application of the Methodology to mRNA Data

The statistical methods described herein for determining changes thedistribution of white blood cells among different subpopulations areapplicable to mRNA expression profiles with the followingconsiderations. A mathematical consideration is that mRNA is typicallyanalyzed on a logarithmic scale, yet the assumptions of the methodsherein involve linearity on an arithmetic scale, since the mixingcoefficients are assumed to act linearly on absolute numbers of nucleicacid molecules; thus, the proposed methods would require analysis ofuntransformed fluorescence intensities, for which skewed distributionswould result in numerical instabilities. A biological consideration isabsence of a linear relationship between cell number and mRNA copies,since proteins may be translated as a consequence of an initial burst ofmRNA transcription upon cellular development, followed by significantmRNA degradation. In contrast, one would expect the average beta valueprovided by Illumina bead-array products, as well as similarlyconstructed quantities from other platforms to scale in proportion tothe actual fraction of methylated nucleic acids with a biologicallyreasonable assumption of two DNA molecules per cell.

An example of an application of methods herein is shown using mRNA data.The validation data set S₀ was obtained from Watkins N A et al., 2009,Blood 113: e1-e9, in which the Illumina Human-6 v2 Expression BeadChipwas used to characterize the mRNA expression profile of eight types ofblood cells: B cells, granulocytes, erythroblasts, megakaryocytes,monocytes, natural killer cells, CD4+ T cells, and CD8+ T cells. Forthis analysis erythroblasts (nucleated progenitors of red blood cells)and megakaryocytes (progenitors of platelets) were removed. The targetdata set S₁ was obtained from Showe M K et al., 2009, Cancer Res 69:9202-10, in which the same mRNA expression platform was used tocharacterize expression differences in isolated mononuclear cellsbetween nonsmall cell lung cancer (NSCLC) cases and controls havingnon-cancer lung disease, adjusting for age, sex and smoking. Inaddition, data was presented from 18 matched case samples, pre- andpost-operative.

The same methodology was used as for the DNA methylation data setsherein, ordering the 46,693 transcripts by F statistic according totheir ability to distinguish six types of leukocytes. Of the 100transcripts having the largest F statistics it was observed that 86overlapped with the transcripts in Showe M K et al., 2009, Cancer Res69: 9202-10. Thus the remainder of the analysis was carried out usingthe 86 overlapping loci. In the analyses, untransformed data (i.e. usingeither the normalized fluorescence intensities or 2 raised to the powerof the normalized log₂ intensities) were used. Application of theconstrained projection in Examples 1 and 5 resulted in an averagepercentage estimates consistent with mononuclear cells (i.e. asubfraction with most granulocytes removed): 3.3% B cell, 3.4%granulocyte, 18.1% monocyte, 29.5% NK cell, 11.6 CD4+ T cell, and 2.2%CD8+ T cell.

Table 26 presents results from 137 NSCLC cases and 91 controls, adjustedfor age, sex, and smoking status. Table 27 presents results from 18matched pre-operative and post-operative samples from NSCLC cases, wherethe analyzed outcome was the difference in untransformed expression(post-operative expression minus pre-operative expression), andcoefficients displayed correspond to the intercept of B₁ (analogous to apaired t-test). Perturbations in T cell distribution were consistentwith known immunological changes resulting from NSCLC (Ginns L C et al.,1982, Am Rev Respir Dis 23: 265-9; Mazzoccoli G et al., 1999, In Vivo13: 205-9), as well as with age and smoking. The perturbations andcoefficient signs were reasonable; the magnitudes were potentiallybiased. For example, the estimates corresponding to granulocytedistribution were much larger than expected given the relatively smallnumber of granulocytes present in a monouclear subtraction. Thus, themethods herein were determined to be suitable for application to mRNAdata sets.

TABLE 26 White blood cell distribution comparing cases to controls inNSCLC mRNA data set Est SE₂ p-value Case Status B Cell 0.8 4.15 0.8511Granulocyte −34.6 9.48 0.0003 Monocyte 17.9 9.58 0.0613 NK 1.3 5.180.8095 T Cell (CD4+) 24.9 9.01 0.0057 T Cell (CD8+) −15.2 9.03 0.0931Age (decades) B Cell −0.7 1.36 0.5824 Granulocyte −7.9 3.45 0.0218Monocyte −6.5 2.76 0.0180 NK −4.0 1.80 0.0255 T Cell (cd4+) 13.0 2.890.0000 T Cell (CD8+) 8.3 2.96 0.0052 Sex (male) B Cell 0.1 2.66 0.9827Granulocyte −34.8 6.41 0.0000 Monocyte 6.8 5.44 0.2091 NK −7.8 3.320.0193 T Cell (CD4+) 21.1 5.39 0.0001 T Cell (CD8+) 13.2 5.76 0.0223Former Smoker B Cell 1.6 3.97 0.6821 Granulocyte 17.2 8.25 0.0375Monocyte 6.1 7.84 0.4368 NK 2.7 5.19 0.6103 T Cell (CD4+) −11.3 8.020.1578 T Cell (CD8+) −20.3 8.28 0.0141 Current Smoker B Cell 3.4 5.210.5183 Granulocyte 31.6 11.26 0.0049 Monocyte 17.8 10.49 0.0907 NK 5.46.93 0.4373 T Cell (CD4+) −21.8 10.25 0.0337 T Cell (CD8+) −41.2 11.100.0002 Est = Regression coefficient estimate (×100%) SE₂ =Double-bootstrap standard error (×100%).

TABLE 27 White blood cell distribution comparing matched pre-operativeand post-operative cases in NSCLC mRNA data set Est SE₂ p-value B Cell−10.7 5.55 0.0543 Granulocyte −19.4 11.16 0.0826 Monocyte −13.4 10.430.1987 NK 6.3 7.15 0.3794 T Cell (CD4+) −11.3 10.57 0.2859 T Cell (CD8+)48.8 11.33 0.0000 Est = Regression coefficient estimate (×100%) SE₂ =Double-bootstrap standard error (×100%).

Example 37 An Array for High-Throughput DNA Methylation Analysis

An array for performing DNA methylation analysis in a high-throughputmanner was made using VeraCode microbeads (Illumina, San Diego, Calif.USA) and DNA sequences of regions in 96 different genes, each sequencehaving one CpG dinucleotide shown within square brackets (FIG. 40) andused to determine methylation status of the gene. Veracode beads arecylindrical glass microbeads 240 microns in length by 28 microns indiameter with a surface suitable for attaching DNA, RNA, protein,antibody and other ligands for performing bioassays. For performing DNAmethylation analysis various CpG specific DNA oligomers were attached tothese beads. Each microbead is inscribed with a high-density holographiccode (24-bit), allowing development of very large numbers of bead types.When a laser is shone at the high density codes of the beads they emit asignal specific to the code and the signal is detected by a CCD camera.The fluorescence of the bead indicates whether the particular CpG sitecarried by the bead is demethylated. The result is compared with thefluorescence readout obtained from DNA from a purified leukocyte sample.A VeraCode array is a collection of beads, each carrying a DNA oligomerspecific for either the methylated or the unmethylated form of aparticular CpG locus, distributed into different wells of a micro titerplate. A user selects the entirety or a subset of nucleotide sequencescontaining CpG sites in a gene or genes of interest for attaching toVeraCode beads to have a custom designed VeraCode array particularlyadvantageous for the user's analysis.

To ascertain which 96 CpGs would give optimal precision for the whiteblood cell (WBC) types the following procedure was followed. TheInfinium HumanMethylation 27K data corresponding to the Magneticactivated cell sorting (MACS sorted leukocyte DNA were assembled in themethylation module of GenomeStudio, and the quality of the data wasassessed by calculating Mahalanobis distances. Forty-seven samplesyielded acceptable data. A matrix of n-values was generated with rowsdefined by microarray CpG locus and columns defined by sampleidentification. A corresponding matrix indicating cellular phenotypeswas also generated, with rows defined by sample identification (inprecisely the same order as the columns in the corresponding matrix) andcolumns defining the cell lineage(s) to which each cell lineage belongs.

A linear mixed effects (LME) model was applied to the Illumina InfiniumHumanMethylation27 WBC lineage as the fixed effect and beadchip plate asthe random effect. The fixed effect groups were: Pan-T cell, CD4+ Tcell, CD8+ T cell, Pan-NK cell, CD56^(dim) NK cell, CD56^(bright) NKcell, B cell, granulocyte, neutrophil, eosinophil, and monocyte. Acrossthe gene loci, this model generated coefficients for each fixed effectgroup indicating relative estimates of DNA methylation for each of thedifferent cell types. Collapsing categories accounted for thehierarchical relationships among cell lineages and a lineartransformation was applied to convert coefficient estimates to estimatedmean value per cell type, resulting in a matrix {tilde over (B)}₀ ofmean values, each row corresponding to a CpG locus and each columncorresponding to a cell type. The model also generated an F-statisticfor each locus that indicates how significantly different DNAmethylation was between the cell types.

A stochastic search algorithm was then employed to select thedifferentially methylated regions (DMRs) that work best in concert on acustom microarray to distinguish leukocyte lineages, and would thereforebe the most effective at quantifying immune cell types in a biologicalsample. The objective was to ascertain which 96 CpGs would give optimalprecision for the WBC types.

The stochastic search algorithm was designed to maximize precision ofestimated cellular fractions, under the assumption that thevariance-covariance of the fraction estimates is proportional to ({tildeover (B)}₀ ^(T){tilde over (B)}₀)⁻¹. To optimize precision for a singleindividual cell type, the corresponding diagonal element of ({tilde over(B)}₀ ^(T){tilde over (B)}₀)⁻¹ was minimized; to optimize a set of celltypes, the sum of the corresponding diagonal elements was minimized.

The general strategy was as follows. The engine is a stochastic searchalgorithm that starts with an initial set of CpGs, which is thebeginning choice for the “current” set. On each iteration a randomlychosen CpG from the current set is switched out with a randomly chosenCpG from the remaining (unselected) CpGs, and precision is comparedbetween the current set and the “candidate” set. If the candidate setgives better precision then the switch is accepted. Otherwise it isrejected. Ideally, by the end of the algorithm, the acceptance rateshould be 0%.

The algorithm was run for 50,000 iterations starting with the 500 CpGshaving the best F statistics. This was repeated ten times with differentrandom number seeds each time. Then, the algorithm was run for 50,000iterations starting with the CpGs having the 500 largest absolute effectsizes (coefficients generated by the LME model) for the WBC types. Thiswas also repeated ten times with different random number seeds eachtime. Next 20 runs were compared and the algorithm run for 50,000iterations starting with the 500 most frequently chosen CpGs from theprevious 20 runs. This was repeated five times with different randomnumber seeds each time. Finally, a run was performed for 750,000iterations starting with the 96 most frequently chosen CpGs from theprevious five runs.

Example 38 Mediation Analysis for Estimating Effects of an Exposure orPhenotype on Measured DNA Methylation

A method is described for conducting a mediation analysis to estimatethe effects of an exposure or to estimate the effects of a specificphenotype on measured DNA methylation along two paths: through changesin WBC distribution, and directly, unmediated by changes in WBCdistribution. Most Epigenome-wide association scans (EWAS) haveattempted to estimate the marginal effect (β, depicted in FIG. 41A) onmeasured DNA methylation, which are effects not adjusted for WBCdistribution. However, a significant portion of the effect on DNAmethylation is mediated through changes in WBC distribution as shown inFIG. 41B. Of interest in EWAS studies is α, the direct effect adjustedfor WBC distribution. Estimating this effect requires estimation of twoother quantities, Γ, the effect of exposure or phenotype on WBCdistribution, and ξ, the effect of WBC distribution on methylation. If yis the DNA methylation measured for subject i at a particular CpG site(j, subscript suppressed for clarity), z_(i) is a p×1 matrix ofcovariates for subject i (including the exposure or phenotype ofinterest), and ω_(i) is the subject-specific WBC distribution estimatedusing constrained projection in the manner described in Example 1 theny_(i)=z_(i) ^(T)α+ω_(i) ^(T)ξ+e_(i), where e_(i) is a zero-mean error.Additionally, the effect of exposure/phenotype on WBC distribution canbe modeled as ω_(i)=Γz+u_(i), where u_(i) is a zero-mean error vector.It is noted that α is a p×1 vector, and K cell types are assumed, sothat ω_(i) is a K×1 vector, Γ is a K×p matrix, and ξ is a K×1 vector. Itfollows that y=z_(i) ^(T)(α+Γ^(T)ξ)+u_(i) ^(T)ξ+e_(i), so that themarginal effect β is the p×1 vector α+Γ^(T)ξ. Estimation proceeds firstby computing {circumflex over (Γ)}=(Σ_(i=1) ^(n)ω_(i)z_(i))(Σ_(i=1)^(n)z_(i) ^(T)z_(i)), then computing û=ω_(i)−ΓZ_(i), r_(i)=(z_(i)^(T),û_(i) ^(T))^(T), {circumflex over (ζ)}=(Σ_(i=1) ^(n)r_(i)^(T)r_(i))⁻¹(Σ_(i=1) ^(n)r_(i)y_(i)), extracting {circumflex over (ξ)}as the last K components of {circumflex over (ζ)} and obtaining{circumflex over (α)} by subtracting {circumflex over(Γ)}^(T){circumflex over (ξ)} from the first p components of {circumflexover (ζ)}.

Statistical inference is achieved by permutation. Specifically, the nulldistributions of {circumflex over (α)} and {circumflex over (Γ)} areobtained by permuting the exposure or phenotype of interest within z(only the components representing the covariate to be tested), and thenull distribution of {circumflex over (ξ)} is obtained by permuting thesubject assignments corresponding to ω_(i). Adjustments for multiplecomparisons are achieved by nesting within each permutation a loop thatestimates {circumflex over (α)}_(j), {circumflex over (Γ)}_(j), and{circumflex over (ξ)}_(j) for each individual CpG, with adjustedp-values obtained by comparing the maximum absolute values of{circumflex over (α)}_(j), {circumflex over (Γ)}_(j), and {circumflexover (ξ)}_(j) (over the CpGs) to the corresponding statistics computedfrom each individual permutation. For comparison purposes, a similarpermutation test can be applied for the marginal coefficient β.

This method to a data set consisting of n=205 control subjects in abladder cancer case/control study (Karagas M R et al., 1998, EnvironHealth Perspect 106: 1047-1050). Four separate analyses were performed:(1) the phenotype of interest was age; (2) the exposure of interest wascurrent smoker status; (3) the exposure of interest was toenail arsenic;and (4) the exposure of interest was reported use of hair dye. Sex wasincluded as a covariate in analyses, and age was included in (2)-(4).

The relationship between {circumflex over (α)} and {circumflex over (β)}for the covariate of interest over autosomal CpGs is shown in FIG. 42.Dots represents overall methylation as indicated by the first componentof the coefficient vector {circumflex over (β)}, corresponding to theintercept (light=low, black=moderate, dark=high). The diagonal straightline represents the identity ({circumflex over (α)}={circumflex over(β)}). The curve depicts a loess fit to the scatter plot. In each of thecases there is an S-shaped relationship that shows attenuation of effect({circumflex over (α)} tends to be smaller than {circumflex over (β)}).Table 28 shows the multiple-comparisons adjusted p-values for eachcoefficient corresponding to the covariate of interest (β, α, γ) andoverall WBC distribution effect on DNA methylation (ξ), obtained bypermutation test using 5000 permutations. As shown in the table,significance of α may be greater than, less than, or equal to thesignificance of β. Remarkably, in every case, the covariate of interestshows a strongly significant association with WBC distribution. It isnoted that WBC shows significant overall association with DNAmethylation.

TABLE 28 Multiple-comparisons adjusted p-values Exposure/Phenotype β α γξ Age 0.0358 0.0838 <0.0002 0.0100 Current Smoker 0.0326 0.0200 <0.00020.0134 Toenail Arsenic 0.1054 0.0512 <0.0002 0.0148 Dye Use 0.26140.2570 <0.0002 0.0102

Example 39 Comparison of Methods Herein for Estimating Fractions ofBlood Cell Types with Non-Negative Matrix Factorization (NNMF)

The methods herein are predicated on the relationship

${{E\left( Y_{i} \right)} = {\sum\limits_{l = 0}^{d_{0}}{b_{0\; l}\omega_{il}}}},$

where Y_(i) is a vector of DNA methylation measurements obtained forsubject i, d₀ is the number of blood cell types to be assayed, ω_(il)are the fractions of each blood cell type corresponding to subject i,and b_(l) is the vector of methylation fractions corresponding to bloodcell type l; the methods herein provide techniques for estimating thefractions ω_(il) assuming the values of b_(l) have been obtained from anexternal validation data set. In contrast, non-negative matrixfactorization (NNMF) could be used to estimate ω_(il) and b_(l)simultaneously in absence of an external validation set. In the contextof NNMF, the d₀ vectors ω_(•l) are considered “factors”, and the d₀vectors (assumed to represent individual methylation profiles) areconsidered “basis vectors” and the number of factors d₀ must be providedto the NNMF algorithm.

Using the 12 experimental samples described in Example 5 NNMF wascompared to methods herein (Examples 1-3). Highest ranking 100 and 500pseudo-DMRs were selected on the basis of informativeness as in Example4; for each choice, the constrained projection described in Examples 1and 5 was used to impute specific cell distributions, then NNMF wasperformed assuming four, five, and six factors (i.e. factor valuesassumed to represent the fractions ω_(il) for one cell type l). The nmffunction in the R package NMF was used with default settings. Since NNMFrequires random inputs, NNMF was applied 100 times, each with differentrandomly generated starting values according to the default settings ofthe nmf function. Six cases were considered, viz., 100 CpGs and 500 CpGsfor each of four, five and six factors. For each of the 100 runs in eachof the six cases, the fitted factors ω_(•)(values of which were assumedto correspond to fractions ω_(il)) were correlated to expected fractionsof B cells, T cells, monocytes, and granulocytes, and for each specificcell type, the factor with the maximum correlation to that type wasassigned to it. Then, for each cell type in each case, the mediancorrelation with assigned factor was tabulated. Table 29 below reportsthese median values, and Table 30 reports the correlation betweenexpected fraction and the fraction observed using methods herein. Acomparison of these tables demonstrates that, though NNMF can achievehigh correlation with expected cell fraction if the pseudo-DMRs areknown in advance, the methods described herein in Examples 1-4 stillachieves higher correlation. In addition, NNMF occasionally fails tomatch known cell types to imputed cell types in a monomorphic manner.Table 31 reports the percentage of runs for which at least two differentcell types were matched via NNMF to the same factor.

It is expected that NNMF would behave less favorably than methodsdescribed herein (Examples 1-4), since NNMF requires the estimation of(n+M) F unknown parameters (where n=# of target samples, M=# of CpGs,and F=# of factors) and methods herein require the estimation of only nK unknown parameters, where K<F and K is the number of known cell types.

TABLE 29 Median correlation for two different sets of CpG containingsequences Factors = 4 Factors = 5 Factors = 6 100 CpGs B cells 0.9980.996 0.996 T cells 0.988 0.989 0.990 Monocytes 0.832 0.900 0.927Granulocytes 0.967 0.954 0.963 500 CpGs B cells 0.998 0.996 0.996 Tcells 0.985 0.993 0.990 Monocytes 0.798 0.896 0.879 Granulocytes 0.9430.977 0.970

TABLE 30 Correlation between expected fraction and the fraction observedusing methods herein. 100 DMRs 500 DMRs B cells 1.000 1.000 T cells0.998 0.997 Monocytes 1.000 1.000 Granulocytes 0.997 0.999

TABLE 31 Percentage of runs for which at least two different cell typeswere matched to the same factor Factors DMRs = 100 DMRs = 500 4 4 2 5 01 6 0 0

Example 40 Quantitation of T Cell, Treg and CD16+CD56^(dim) NK CellNumbers by CD3Z, FoxP3 and NKp46 Methylation Assays, Respectively UsingDroplet Digital PCR

A droplet digital PCR technique was used to quantitate T cell, Treg andCD16+CD56^(dim) NK cell numbers using CD3Z, FoxP3 and NKp46 methylationassays described in Examples 15 and 30. Digital PCR (dPCR) is arefinement of conventional PCR methods and is used to directly quantifyand clonally amplify nucleic acids. dPCR and traditional PCR differ inmethod of measuring nucleic acid amounts, as dPCR is more precise. Thetwo PCR methods differ in that the sample is separated into a largenumber of partitions in dPCR, and the reaction in each partition iscarried out individually. This separation produces a more reliablecollection and sensitive measurement of nucleic acid amounts.

Isolated and purified T cells and Tregs were serially diluted, andcopies of each of the targets were quantified as measures of cellnumbers. Bisulfite converted DNA from whole blood, isolated humanT-cells and Treg cells and from NK cells was quantified using theemulsion partitioning method of BioRad QX100™ Droplet Digital™ PCR(ddPCR™) system. This system creates portioned PCR reaction usingwater-in-oil droplets for performing high-throughput digital PCR. TheQX100 droplet generator partitions samples into 20,000 nanoliter-sizeddroplets. After PCR using a thermal cycler, droplets from the sampleswere streamed in single file on a reader (QX100 droplet reader). ThePCR-positive and PCR-negative droplets were counted to obtainquantification of target DNA in digital form. Results are shown in FIGS.43-46 as dot plots of fluorescence intensities of the droplets, witheach point on the plot representing a single droplet. The horizontallines are cutoffs between “positive” and “negative” droplets for eachsample. A measure of concentration of the target sequence (demethylatedCD3Z, Fox3P or NKp46) in copies per microliter was obtained as readoutfrom the system. Dividing target sequence concentration by total DNAconcentration obtained by C-less PCR yielded the percent of total DNAthat was positive for the target DNA region (FIGS. 45-46).

Data in figures show that successful amplification and detection of CD3Zand Foxp3 DMRs, respectively were obtained. FIG. 43A and FIG. 44A showdot plots indicating distinguishing of positive droplets and negativedroplets. FIG. 43B and FIG. 44B show the calculated absolute numbers ofpositive PCR droplets. Results obtained from dilution of standardpurified T cells shows correspondence of quantities of CD3Z and FoxP3genes with extent of dilution and hence validity of dPCR as a detectionmethod for methylation based assay of immune cell identity. Otherpartitioning approaches have been developed that employ microfluidicmanipulation and results similar to the data obtained herein areexpected from the use of such other methods of partitioning. FIG. 45shows quantitation of purified NK cells under different conditions andFIG. 46 shows quantitation of whole blood and of purified leukocytesubsets by measuring demethylated NKp46 DMR described in Example 30.

Example 41 Sample Workflow

FIG. 47 summarizes the workflow carried out for samples derived fromhuman whole blood utilized in the following examples. FIG. 51 describes85 venous whole blood samples that were collected from disease freehuman donors. Of these, 79 samples were used for isolation of targetcell type by magnetic activated cell separation (MACS) and six sampleswere subjected to conventional immune profiling in which fresh aliquotsare analyzed by protein based methods. Purity was confirmed in the 79samples isolated by MACS by fluorescence activated cell sorting (FACS).The six samples separated by conventional immune profiling were storedunder 12 specific storage conditions which differed by presence ofcoagulants and temperatures, and duration, which yielded 72 samples.

DNA was extracted from each of the 79 samples from FACS and the 72samples from the 12 specific storage conditions. An aliquot of thegenomic DNA from five of the FACS purified, DNA extracted 79 sampleswere combined in quantities that mimicked human blood by artificiallyreconstituting peripheral blood. Aliquots of each of seven of the cellDNA mixtures, the FACS purified DNA extracted 79 samples, and the 72samples in the 12 specific storage conditions were then randomized.Aliquots of the resulting 158 samples were contacted with sodiumbisulfate, which is used in the analysis of methylation status ofcytosines in DNA, and 158 sodium bisulfate treated aliquots of the 58samples were analyzed using each of a high-density methylationmicroarray (HDMA) and a low-density methylation microarray (CDMA).

Date for DNA methylation microarrays are available at an NCBI websiteentitled, “Gene Expression Omnibus” (GEO) in accordance with a protocolknown as, “Minimum Information About a Microarray Experiment” (MIAME).The methods, materials and conditions described in this example and FIG.47 are fully described in the following examples.

Example 42 Purified Leukocyte Subtypes

Venous whole blood samples were collected from 79 disease-free humandonors whose demographic characteristics are shown in Table 32. Ahomogenous populations of one specific type of leukocyte was obtainedfrom each sample, which were purified by MACS, a method of cellseparation that utilizes antibody-conjugated magnetic microbeads, and acombination of positive and negative selection protocols (MiltenyiBiotec Inc., Auburn, Calif.). Purity of the 79 purified cell samples wasdetermined by FACS. Representative FAGS results for 15 sample types areshown in FIG. 48. The hierarchical relationship between the differentpopulations of MACS purified leukocyte subtypes, and the number ofreplicate samples for each cell type, is shown in FIG. 49.

TABLE 32 Demographic characteristics of blood donor for purified cellsTotal number 79 Age, Mean (SD) 30 (9)    Weight (lbs), Mean (SD) 181(38)    Height (inches), Mean (SD) 69 (3.7)  Gender Male, No. (%) 62(78%) Female, No. (%) 15 (19%) Unknown, No. (%) 2 (3%) Race White No.(%) 32 (41%) Hispanic, No. (%) 12 (15%) Black, No. (%) 13 (16%) Asian,No. (%) 13 (16%) Native American, No. (%) 3 (4%) Unknown/Other, No. (%)6 (8%) Tobacco smoking Yes, No. (%) 13 (16%) No, No. (%) 33 (42%)Unknown, No. (%) 33 (42%)

Example 43 Conventionally Profiled Whole Bloods

Six additional venous whole blood samples were collected from differentdisease free human donors whose demographic characteristics aresummarized in Table 32. The workflow for these samples is shown in FIG.58. Each whole blood sample was divided into three aliquots, whichcontained an anticoagulant: heparin, citrate, or EDTA. A portion of thealiquot in Heparin was used to perform conventional immune profilingmethods, including flow cytometry which is described below, manual5-part white blood cell differential and CBC with automated 5-part whiteblood cell differential. Another portion of this aliquot for each samplewas analyzed for methylation assessment using the high-density DNAmethylation microarray (HDMA; described in examples below). Anotherportion was analyzed for methylation assessment using the low-densitymethylation array (LDMA; described in examples below) directly withoutstorage. Aliquots for each of the six blood samples were each storedovernight at one of three temperatures (room temperature, 4° C., and−80° C.) prior to methylation assessment on the HDMA.

Example 44 Differential Leukocyte Counts

Manual white blood cell (WBC) counts were performed according toestablished standards (Koepke J A. 1977 Differential Leukocyte Counting.Stokie, Ill.: College of American Pathologists; Houwen B. 2001 Thedifferential cell count. Laboratory Hematology 89-100). Automated WBCcounts were performed using the XE-5000™ Automated Hematology System(Sysmex America, Inc., Mundelein, Ill.) according to manufacturerinstructions. The following cell types were enumerated: total WBC,lymphocytes, monocytes, neutrophils, basophils and eosinophils.

Example 45 Fluorescence Activated Cell Sorting (FACS) of LeukocyteSubsets

Blood samples were directly stained for cell surface markers and wereincubated for 20 minutes in the dark at 4° C. Antibodies were purchasedfrom eBioscience Inc (San Diego, Calif.). Each blood sample was dividedinto two aliquots. The first aliquot cells were stained with: anti-humanCD3e FITC (catalog number 11-0039-41), anti-human CD4 APC-eFluor 780(catalog number 47-0049-41), anti-human CD8a 605NC (catalog number93-0088-41), anti-human CD16 PE-Cy7 (catalog number 25-0168-41),anti-human CD25 APC (catalog number 17-0259-41), anti-human CD45PerCP-Cy5.5 (catalog number 45-9459-41), antihuman CD56 PE (catalognumber 12-0567-41), and anti-human CD127 eFluor 127 (catalog number48-1278-41) to analyze T-cells, NKT cells, and NK cells. The secondaliquot cells were stained with: anti-human CD14 FITC (catalog number11-0149-41), anti-human CD15 eFluor 450 (catalog number 48-0159-41),anti-human CD16 PE-Cy-7, anti-human CD19 APC-eFluor 780 (catalog number47-0199-41), anti-human CD45 PerCP-Cy5.5, and anti-human CD123 PE(catalog number 12-1239-41) to analyze B-cells, monocytes, andgranulocytes (neutrophils, eosinophils, and basophils).

Unstained, isotype, and fluorescence-minus-one (FMO) controls were usedto determine sample gating and background. Individual compensationcontrols were used in each sample run. CountBright counting beads(Invitrogen, catalog number C36950) was added for quantification oftotal leukocytes and each subset. Acquisition was performed within 12hours of blood draw on the FACSAria III flow cytometer (BectonDickinson) using FACSDiva Software (Becton Dickinson). An acquisitionlimit of 10,000 events was used on the monocyte gate, using FSC versusSSC dot plot, for each aliquot. Final data analysis and presentation ofresults was done using Flowjo software (TreeStar Inc).

Cell types and detection parameters were set as follows: Lymphocytes:low SSC (side scatter) and low FSC (forward scatter); B-cells: CD45+ andCD19+; T-cells: CD45+ and CD3+ antibodies; Helper T-cells (Th): CD3+ andCD4+; Regulatory T-cells (Tregs): CD3+ and CD4+ and CD25+ and FOXP3+;Cytotoxic T-cells (Tc): CD3+ and CD8+; Natural Killer T-cells (NKT):CD3+ and C56+; Natural Killer (NK) cells: CD3− and CD56+; Effector NKcells: CD3− and CD16+ and CD56 dim (i.e. lower level); Regulatory NKcells: CD3− and CD16− and CD56 bright (i.e. higher level); CD8+ NKcells: CD3− and CD8+ and CD56+antibodies; CD8− NK cells: CD3− and CD8−and CD56+; Granulocytes: high SSC (side scatter) and high FSC (forwardscatter); Eosinophils: CD44+ and high SSC and high FSC; Basophils:CD123+ and high SSC and high FSC; Neutrophils: CD15+ and CD16+ and highSSC and high FSC; Monocytes: low SSC (side scatter) and high FSC(forward scatter) and CD14+.

Example 46 DNA Extraction

Genomic DNA was extracted and purified from whole blood and from MACSpurified leukocyte samples using AllPrep DNA/RNA/Protein Mini Kit(QIAGEN, catalog number 8004) or DNeasy blood and tissue kit (QIAGEN,catalog number 69506) according to manufacturer's instructions andprotocol. DNA was quantified by NanoDrop ND-1000 Spectrophotometer(NanoDrop Technologies, Inc.). DNA samples for some applications werefurther purified using the DNA Clean and Concentrator according tomanufacturer's protocol (ZYMO Research Corporation, catalog numberD4004). Samples were kept at 4° C. for shortterm storage or at −20° C.for long-term storage.

Example 47 Artificial Blood Samples

Genomic DNA from five of the purified leukocyte samples was combined inquantities that mimicked human blood under seven clinical conditions(Table 33). DNA was mixed thoroughly and stored briefly at 4° C. priorto analysis.

TABLE 33 Proportions of DNA from purified cells combined into mixturesthat artificially reconstruct blood under clinical conditions ClinicalT- B- NK Granu- mono- condition cells cells cells locytes cytes normal20% 2.5%  1.5%   67%   9% T-cell  6% 6% 5% 70.5% 12.5% lymphopenia-1T-cell  2% 7% 6% 71.5% 13.5% lymphopenia-2 granulocytosis 10% 0% 0%  90%  0% granulocytopenia 34.5%  17%  16%    9% 23.5% B-cell lymphoma 20.5% 0.5%  2% 67.5%  9.5% monocytosis 14% 0% 0%  61%  25%

Example 48 Sodium Bisulfite Conversion

Genomic DNA from six conventionally profiled whole blood samples,genomic DNA from the 79 purified leukocyte samples, and DNA mixtures inthe seven artificial blood samples were randomized and treated withsodium bisulfite using ZYMO EZ-96 DNA Methylation Kit (ZYMO ResearchCorp., catalog number D5004), and were stored at −80° C. until used.This method and procedure was used for assessment of DNA methylation byconverting unmethylated cytosine residues to uracil.

Example 49 High-Density DNA Methylation Microarray (HDMA)

To analyze patterns of cell-lineage specific DNA methylation and examinethe viability of the mathematical models herein, methods were developed.Forty-six of the purified leukocyte DNA samples, six of the artificialblood reconstruction samples (excluding T-cell lymphopenia 1), and thesix conventionally profiled whole blood samples were analyzed using theInfinium® HumanMethylation27 Beadchip microarray (Illumina Inc., SanDiego, Calif.). This platform was used to quantify the methylationstatus of 27,578 CpG loci from 14,495 genes, with a redundancy of15-fold to 18-fold. The ratio of fluorescent signals was computed fromboth alleles using the following equation: β(max(M,0))/(|U|+|M|)+100.The resultant β-value was a continuous variable from 0 (unmethylated) to1 (completely methylated) that represents the methylation at each CpGsite and was used in subsequent statistical analyses. Data wereassembled with the methylation module of GenomeStudio software, aproduct of Illumina, Inc. (Bibikova, M. et al. 2009 Epigenomics1:177-200).

Following the crosscheck optimization procedure, a minimum number of 34CpG loci were selected to establish DNA methylation signatures for theHDMA reference library. These loci were found in the following genes:CLEC9A (2 loci) (SEQ ID NO:119), INPP5D (SEQ ID NO:120), INHBE (SEQ IDNO:28), UNQ473 (SEQ ID NO:121), SLC7A11 (SEQ ID NO:122), ZNF22 (SEQ IDNO:11), XYLB (SEQ ID NO:123), HDC (SEQ ID NO:26), RGR (SEQ ID NO:124),SLCO2B1 (SEQ ID NO:125), C1orf54 (SEQ ID NO:126), TM4SF19 (SEQ IDNO:127), IGSF6 (SEQ ID NO:28), KRTHA6 (SEQ ID NO:128), CCL21 (SEQ IDNO:129), SLC11A1 (SEQ ID NO:130), FGD2 (SEQ ID NO:2), TCL1A (SEQ IDNO:131), MGMT (SEQ ID NO:132), CD19 (SEQ ID NO:133), LILRB4 (SEQ IDNO:134), VPREB3 (SEQ ID NO:135), FLJ10379 (SEQ ID NO:136), HLA-DOB (SEQID NO:43), EPS8L3 (SEQ ID NO:4), SHANK1 (SEQ ID NO:137), CD3D (2 loci)(SEQ ID NO:93), CHRNA3 (SEQ ID NO:138), CD3G (2 loci) (SEQ ID NO:92),RARA (SEQ ID NO:139), GRASP(SEQ ID NO:140).

Example 50 Low-Density DNA Methylation Microarray (LDMA)

To thoroughly validate the DNA methylation-based approach to immuneprofiling used herein, methods in examples herein were performed toanalyze the 79 purified leukocyte samples, the seven artificial bloodreconstruction samples, and the 72 samples of the six conventionallyprofiled whole blood samples (each stored under 12 different conditions)by the VeraCode® custom GoldenGate® Methylation assay (GGMA). The assayused a four-probe design to differentiate between methylated andunmethylated sequences for a custom panel of 96 different CpG loci. Themethod generated DNA targets through allele-specific, amplificationusing universal primers, and hybridization to a bead array at sitesbearing complementary address sequences. The hybridized targetscontained a fluorescent label denoting a methylated or unmethylatedstate for a given locus.

Methylation status of each interrogated CpG site was calculated as theratio of fluorescent signal from one allele relative to the sum of bothmethylated and unmethylated alleles, thereby generating a β-valueranging from 0 (unmethylated) to 1 (fully methylated). Several differentcontrol types were used to ensure data quality. Each bead type wasrepresented with an average 30-fold redundancy. Data were assembled withthe methylation module of GenomeStudio software (Illumina, Inc.).

Following the crosscheck optimization procedure, a minimum number of 20CpG loci were selected to establish DNA methylation signatures for theLDMA reference library. The selected loci were found in the followinggenes: FGD2 (SEQ ID NO:2), HLA-DOB (SEQ ID NO:43), BLK (SEQ ID NO:40),IGSF6 (SEQ ID NO:28), CLDN15 (SEQ ID NO:29), SFT2D3 (SEQ ID NO:89),ZNF22 (SEQ ID NO:11), CEL (SEQ ID NO:39), HDC (SEQ ID NO:26), GSG1 (SEQID NO:67), FCN1 (SEQ ID NO:53), OSBPL5 (SEQ ID NO:64), LDB2 (SEQ IDNO:36), NCR1 (SEQ ID NO:91), EPS8L3 (SEQ ID NO:4), CD3D (SEQ ID NO:93),PPP6C (SEQ ID NO:7), CD3G (SEQ ID NO:92), TXK (SEQ ID NO:30), FAIM (SEQID NO:32).

Example 51 Statistical Methods

Statistical analyses in the examples herein were performed using the Rstatistical platform (www.Rproject.org)

Example 52 Identification of Cell Lineage-Specific Methylation

A linear mixed effects (LME) model was applied to the purified leukocyteHDMA data with cell type designated as the fixed effect and beadchip asthe random effect (controlling for plate effects) to identify DNAmethylation signatures that represent biomarkers of leukocyte subtypes.This method generated F-statistics for every CpG on the array indicatinghow well differential methylation at that locus distinguishes sevendifferent leukocyte lineages: T-cells, B-cells, NK cells, monocytes,eosinophils, basophils, and neutrophils. This method and calculationalso generated seven coefficients for each CpG indicating directionalityand intensity of differential methylation at that locus for the celltypes.

Example 53 Selection of CpG Panel for Immune Profiling

Using the LME results, a stochastic search algorithm was implemented todetermine the best combination of putative DMRs to use for thesimultaneous assessment of T-cells, B-cells, NK cells, monocytes, andgranulocytes in a human blood sample. This algorithm was used to assessthe predictive ability of a selected panel of CpG loci by analyzing thevariance in methylation across cell types as designated in a contrastmatrix. If substitution of randomly selected locus of one of the loci inthe panel would improve the predictive ability, the substitution wouldbe accepted and the new locus would replace the old in the panel. Thissearch algorithm was implemented for 50,000 iterations starting from tendifferent random number seeds in three stages: first starting with thetop 500 F-statistics, then the top 500 absolute effect sizes (based onthe LME coefficients), and then the top 500 from the first two stages.The stochastic search algorithm was implemented an additional iteration,starting from the top 96 from the final stage above until the acceptancerate for substitutions definitively dropped to zero.

Example 54 DNA Methylation-Based Cell Quantification

To estimate cell mixtures by DNA methylation marks, methods hereinemployed a constrained projection, in which a DNA methylation profilefrom a target profile is projected onto mean methylation profiles forisolated cell types, subject to the constraint that the projectionvalues (estimated mixing weights) were greater than or equal to zero andsum to less than one. The mean values were obtained from a referencelibrary of DNA methylation signatures, and the projection wasimplemented via quadratic programming (Goldfarb, D. et al. 1982 IdnaniA. Dual and Primal-Dual Methods for Solving Strictly Convex QuadraticPrograms. In: Hennart J P, ed. Numerical Analysis. Berlin:Springer-Verlag pages 226-39; Goldfarb D et al. 1983 MathematicalProgramming 27:1-33; Houseman, E. A. et al. 2012 BMC Bioinformatics13:86).

Example 55 Cell Differential Quantification Using DNA Methylation

Methods herein used DNA methylation to detect and quantify theproportions of each of T-cells, B-cells, NK cells, monocytes, basophils,eosinophils, and neutrophils in any single human blood sample. The firststep in achieving this goal was to establish a reference library of DNAmethylation signatures that serve as biomarkers for those cell types. Amicroarray was used to identify and to assess DNA methylation in WBCsubsets purified from normal (disease-free) human blood, to generate areference data set. To generate a target data set, DNA methylation atthe same CpG loci as the reference data set was assessed in the targetsamples using the same platform used to establish the reference library.The cell types of interest were quantified in the target samples byprojecting their DNA methylation profiles onto the mean methylationprofiles for the purified WBC types of interest from the reference dataset using quadratic programming (Houseman, E. A. et al. 2012 BMCBioinformatics 13 (86): pages 1-16, which is hereby incorporated byherein in its entirety). Sample workflows are illustrated in FIG. 47.

Example 56 DNA Methylation Distinguishes WBC Subsets

Venous whole blood was collected from 79 disease free human donors(Table 32) and homogenous populations of the WBC types of interest wereisolated from each blood sample.

using magnetic activated cell separation (MACS) with the purityconfirmed by FACS (FIG. 48). To account for inter-individual variation,at least four samples of each cell type were purified from each donor(FIG. 49). A subset of these purified cell samples were analyzed by ahigh-density methylation microarray (HDMA), the InfiniumHumanMethylation (Illumina Inc., San Diego, Calif.), to identifypatterns of WBC lineage-specific DNA methylation. See Houseman, E. A. etal. 2012 BMC Bioinformatics 13 (86): pages 1-16, which is herebyincorporated by herein in its entirety. The HDMA assessed DNAmethylation at 27,578 CpG loci in 14,495 genes throughout the humangenome. A linear mixed effects model applied to these data (with celltype as the fixed effect and beadchip as the random effect) revealedhundreds of CpG loci exhibiting lineage-specific DNA methylationpatterns that distinguished the WBC types of interest.

A panel of 96 CpG loci was selected that function in concert for DNAmethylation-based immune profiling, which loci could be placed on acustom low-density DNA methylation microarray (LDMA), the VeraCodeGoldenGate methylation array (Illumina Inc., San Diego, Calif.), whichallowed independent confirmation of the HDMA results, and would lead tomore efficient use of resources for the quantification of WBC subsets intarget samples. A bioinformatic search algorithm was applied that worksin a stochastic manner, substituting CpG loci and assessing thepredictive ability of the selected loci by analyzing the variance inmethylation across WBC types as designated in a contrast matrix.

A panel of 96 CpG loci were selected from which DNA methylation clearlydistinguished the WBC types of interest, B-cells, T-cells, NK cells,monocytes, neutrophils, basophils, and eosinophils, as indicated byunsupervised hierarchical clustering of HDMA data for the purified WBCsubsets (FIG. 50). These 96 CpG loci were placed on the LDMA, which useddifferent chemistry was used than for the HDMA and therefore representedan independent platform. Unsupervised hierarchical clustering of LDMAdata for the purified WBC subsets identified that DNA methylation atthese loci clearly and reliably distinguished the WBC types of interest(FIG. 51).

Example 57 Accurate Prediction of Purified WBC Subset Identities UsingDNA Methylation

To test the performance of the method, both HDMA and LDMA derived DNAmethylation data sets for the purified WBC subset samples were analyzedas if the data sets were target data sets containing unknown samples.Projection was performed using quadratic programming to quantity sevendifferent leukocyte subtypes in each of the purified WBC subset samplesusing methylation signatures from the corresponding HDMA or LDMAreference library. This crosscheck procedure was used to improveefficiency by identifying any problematic purified WBC subset samples inthe reference set, and to determine the minimum number of CpG locirequired for accurate leukocyte subtype detection and quantification.

It was observed that only 34 and 20 CpG loci respectively, were requiredto accurately predict the leukocyte subtype identity of unknown purifiedWBC subset samples using the HDMA (FIG. 52), and LDMA (FIG. 53),respectively. These loci are listed in Examples 49-50 herein. Thedisparity in the minimum number of loci required with each of the twoplatforms resulted from the fact that fewer purified WBC subset sampleswere analyzed using the HDMA (due to higher costs associated with thatplatform) and more CpG loci were therefore needed to compensate.

These methods and arrays used herein revealed that CD16-CD56bright“regulatory” NK cells should be eliminated from subsequent referencedata sets, since this cell type was frequently misclassified. Thesecells were not present in significant numbers in peripheral blood, andwere found primarily in lymphatic tissue. The purities of the regulatoryNK cell samples obtained from peripheral blood were low according toFACS analysis (FIG. 48I), providing one plausible explanation for anyconsistent misclassification.

Example 58 Clinically Relevant Shifts in the WBC Composition DetectedUsing DNA Methylation

Efficacy of methods and arrays herein were analyzed by detectingspecific immune modulations that occur in peripheral blood of humanpatients exhibiting particular clinical conditions: diminished T-cells(T-cell lymphopenia), increased granulocytes (granulocytosis),diminished granulocytes (granulocytopenia), diminished B-cells (B-celllymphopenia), and increased monocytes (monocytosis). Genomic DNAextracted from five of the purified WBC subset samples were combined inprecise quantities that represented/mimicked constitution of human bloodfound in patients exhibiting each of these clinical conditions, and innormal patients (Table 33).

DNA methylation was assessed in these DNA mixtures using both the HDMAand LDMA platforms and methods. Five different WBC types in each mixturewere quantified by performing projections by quadratic programming usingthe appropriate reference data set, utilizing only the minimum numbersof (34 or 20) CpG loci established by the crosscheck procedure describedin examples herein.

Five WBC quantities measured using DNA methylation methods using HDMAand LDMA were observed to have comparable results to the expected values(FIG. 54A and FIG. 55A). These data indicate that methods andcompositions herein were effectively detected five specific, clinicallyrelevant modulations in peripheral blood immune cell samples.

Example 59 DNA Methylation Analysis Provides Accurate WBC QuantificationCompared to Established Methods

Methods and arrays described herein were compared to gold standardmethods of WBC quantification. Venous whole blood was collected from sixdifferent, disease-free, human donors (FIG. 47). Blood samples wereanalyzed using methods described herein and were compared to threedifferent, well established immune profiling methods: manual 5-partdifferential, CBC with automated 5-part differential, and FACS.

Genomic DNA was extracted the blood samples, and DNA methylation wasassessed using both the HDMA and LDMA platforms and methods described inexamples herein. WBC types were quantified by quadratic programmingusing the corresponding reference data set, and utilizing only theminimum numbers of (34 or 20) CpG loci identified by the crosscheckprocedure described in examples herein. Quantities of WBC types measuredby the DNA methylation methods were comparable to the results obtainedusing the gold standard methods (FIG. 54B-D, FIG. 55B-D, and FIG. 56).

Agreement between methods herein and the gold standard methods wasexcellent, and little evidence of systematic bias was observed. The meandifference between each pair of estimates was approximately zero.Standard deviations in model prediction values was determined bycalculating root mean square error (RMSE) between WBC quantitiesmeasured using DNA methylation and WBC quantities measured by each ofthe gold standard methods (FIG. 57A-C). It was observed that thestandard deviations were low. The levels of uncertainty were similar tothose levels observed among the gold standard methods (FIG. 57D-F).

Example 60 Storage Conditions do not Affect WBC Estimates Obtained UsingDNA Methylation

Examples analyzed whether the stability of DNA allows methods and arraysherein to overcome many limitations of previous WBC quantificationmethods. The DNA methods and arrays used herein did not require freshblood or an intact cell membrane. Thus these methods and materials areuseful for analyzing samples that were previously precluded fromimmunological assessment, such as archived blood samples that are storedin hospitals and laboratories, or blood samples collected in ananticoagulant not compatible with a particular method.

Examples analyzed whether a blood anticoagulant and/or storagetemperature variations alter WBC quantification by DNA methylationmethods herein. Six venous whole blood samples were collected fromdisease-free human donors and were contacted with an anticoagulants:citrate, heparin, or EDTA. DNA extracted was extracted from freshsamples and also from samples stored at room temperature, 4° C. or −80°C. for at least 24 hours prior to DNA extraction (FIG. 58). DNA sampleswere analyzed using a LDMA platform to assess DNA methylation and togenerate a target data set to consider the effects of blood storageconditions. Seven WBC types of interest were quantified in each of thesetarget samples by performing a projection by quadratic programming usingthe LDMA reference set. The minimum number of 20 CpG loci established bythe crosscheck procedure described above was used.

It was observed that the storage conditions examined did not alter WBCsubset quantities measured in human blood by DNA methylation (FIG. 59).

1. A method for assessing a disease condition in a subject, comprising:measuring a CD3Z positive T lymphocyte cell number in a sample from thesubject by analyzing methylation in the sample of at least one CpGdinucleotide (CpG) in gene CD3Z or in an orthologous or a paralogousgene thereof, wherein an amount of a demethylated C of the at least oneCpG in the sample is a measure of CD3+ T lymphocyte cell number; andcomparing the amount of the demethylated C in the sample from thesubject with that in positive control samples from patients with thedisease condition, and with that in negative control samples fromhealthy subjects, wherein the disease condition is selected from: anautoimmune disease, an allergy, a transplant rejection, obesity, aninherited disease, immunosuppression and a cancer.
 2. The methodaccording to claim 1, wherein assessing a disease condition comprises atleast one of: monitoring, diagnosing, prognosing, and measuring responseto therapy by comparing the measured CD3+ T lymphocyte cell numbers inthe subject after therapy to that in the patients with the diseasecondition and in the healthy subjects. 3-13. (canceled)
 14. A kit formeasuring CD3+ T lymphocyte and FOXP3+ T regulatory cell numbers, byanalyzing methylation of CpG positions in CD3Z and FOXP3 genes, the kitcomprising sequencing and PCR primers specific for the CD3Z and theFOXP3 gene DMRs and instructions for analyzing and comparing methylationof the CpG positions of a subject in need of diagnosis of a disease withthat of control subjects.
 15. A method for assessing a disease conditionby estimating an alteration in proportions of types of leukocytes in asample from a subject, the method comprising: measuring a DNAmethylation profile for each type of leukocyte and for unfractionatedcells, wherein DNA methylation profiles are obtained for a plurality ofCpG loci, and obtaining the status of an individual CpG locus byamplifying DNA from each of the types of leukocyte and from theunfractionated cells, wherein amplifying comprises hybridizingmethylation sensitive locus-specific DNA oligomers corresponding to eachCpG locus; ordering CpG loci by ability to distinguish types ofleukocytes, wherein the ordering of the CpG loci determinesdifferentially methylated DNA regions (DMRs), wherein obtaining DMRscomprises statistically minimizing introduction of bias in amount oftotal methylation status of a large number of CpG loci obtained from theunfractionated cells by employing a Bayesian treatment utilizing priorprobabilities of the methylation status at each individual locus,thereby identifying a plurality of CpG loci to include in themeasurement, wherein an amount of CpG loci distinguishes DMR signaturesamong the types of leukocytes and minimizes bias; obtaining DNAmethylation profiles comprising DMRs from the types of leukocytes,wherein the DNA methylation profiles comprise validating measures ofrelative amounts of the types of leukocytes, and obtaining DNAmethylation profiles of the unfractionated cells as surrogate measuresof relative amounts of each type of leukocyte in the unfractionatedcells; employing an analog of a measurement error model wherein a DNAmethylation surrogate y is reverse formulated with respect to thedisease outcome z, asy=f(z), wherein y denotes a multivariate random variable representing amethylation profile, z denotes a disease outcome or state, and f denotesa probability distribution; y, z, and leukocyte distribution, ω arerelated by the estimator equations,E(y|ω)=g(ω), and under an assumption E=(z|ω,y)=E(z|ω), wherein E denotesan expectation of a random variable and ω denotes a subject specificdistribution of leukocytes; and, comparing relative amounts of each typeof leukocyte in the sample from the subject with those in a controlsample, thereby providing an assessment of the disease condition. 16.The method according to claim 15, wherein the locus-specific DNAoligomers are linked to an array selected from the group of: a glassslide array; a quartz slide array; a fiber optic bundle array, a planarslide array, a micro-well array; a multi-well dish array; a digital PCRarray; and a bead array having beads located at known addressablelocations on the array. 17-26. (canceled)
 27. A method of predicting amethylation class membership in a bodily fluid sample of a subject forassessing disease status of the subject, wherein the methylation classmembership corresponds to an epigenetic signature of a plurality ofleukocyte types, the method comprising: measuring amounts of DNAmethylation in each of a plurality of leukocyte type populations todetermine differentially methylated regions (DMRs); ranking leukocyteDMRs for each leukocyte type according to statistical strength ofassociation of the DMR with each leukocyte type; randomly dividing adata set of control subjects and subjects with a disease into groupshaving substantially the same numbers of control subjects and subjectswith the disease to obtain a training set and a testing set; clusteringsamples in the training set using a defined number of highest rankedleukocyte DMRs to determine clustering solutions, wherein a clusteringsolution corresponds to the methylation class membership; and predictingthe methylation class membership for subjects within the testing set byapplying the clustering solutions obtained from the training set to thehighest ranked leukocyte DMRs in the testing set, wherein clinicalutility of the predicted methylation class membership is determined bytesting association of the predicted methylation class membership withthe disease status of the subject.
 28. The method according to claim 27,wherein the highest ranked leukocyte DMRs is shown in Table 21, whereineach DMR is identified by chromosomal location and gene name, and thedefined number of highest ranked leukocyte DMRs is selected from: atleast 10, at least 20, at least 30, at least 40 and
 50. 29-36.(canceled)
 37. An array for estimating proportions of leukocyte types ina sample from a mammal for assessing a disease condition of the mammalby analyzing differential methylation of CpG dinucleotides in aplurality of genes of the sample, the array comprising: a plurality ofDNA probes attached to a plurality of surfaces at known addressablelocations on the array, wherein the surface at each location is attachedto a DNA probe having a specific nucleotide sequence, wherein the DNAprobe having the specific nucleotide sequence hybridizes to a DNAsequence of a methylated form or an ummethylated form of a CpGdinucleotide in a sequence of a gene of the plurality of genes in thesample, wherein the array is selected from having: at least 16 probes,at least 64 probes, at least 96 probes, and at least 384 probes.
 38. Thearray according to claim 37, wherein the plurality of DNA probes hasnucleotide sequences that hybridize with a respective plurality of 118different nucleotide sequences occurring in the plurality of genes. 39.The array according to claim 38, wherein the plurality of 118 nucleotidesequences comprises at least one gene or locus selected from the groupof: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ IDNO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO: 14, SEQ ID NO:15, SEQ IDNO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ IDNO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ IDNO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ IDNO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO: 34, SEQ ID NO:35, SEQ IDNO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ IDNO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ IDNO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ IDNO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID NO:55, SEQ IDNO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60, SEQ IDNO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ IDNO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ IDNO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO: 74, SEQ ID NO:75, SEQ IDNO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID NO:80, SEQ IDNO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ ID NO:84, SEQ ID NO:85, SEQ IDNO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ ID NO:89, SEQ ID NO:90, SEQ IDNO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ ID NO: 94, SEQ ID NO:95, SEQ IDNO:96, SEQ ID NO:119, SEQ ID NO:120, SEQ ID NO:121, SEQ ID NO:122, SEQID NO:123, SEQ ID NO:124, SEQ ID NO:125, SEQ ID NO:126, SEQ ID NO:127,SEQ ID NO:128, SEQ ID NO:129, SEQ ID NO:130, SEQ ID NO:131, SEQ IDNO:132, SEQ ID NO:133, SEQ ID NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQID NO:137, SEQ ID NO:138, SEQ ID NO:139, and SEQ ID NO:140. 40-46.(canceled)
 47. A method for estimating proportions of types ofleukocytes in a sample from a subject for assessing a disease conditionof the subject by analyzing differential methylation of CpGdinucleotides in a plurality of genes of the sample, the methodcomprising: providing an array having a plurality of DNA probes attachedto a plurality of surfaces at known addressable locations on the array,wherein the surface at each location is attached to a DNA probe having aspecific nucleotide sequence; reacting genomic DNA in the sample with abisulfite reagent to convert unmethylated cytosine residues to uracil;hybridizing resulting bisulfite treated genomic DNA with the array toobtain resulting hybridized probes on the array, wherein the DNA probeshybridize to a DNA sequence of each of a methylated form and anummethylated form of a sequence having a CpG dinucleotide in a gene foreach of the plurality of genes; and detecting the methylation status ofeach of the CpG dinucleotides in each sequence, thereby estimatingproportions of types of leukocyte in the sample from the subject forassessing the disease condition of the subject.
 49. The method accordingto claim 48, wherein amplifying by PCR further comprises: using primerspairs having a 5′ primer specific to each of the methylated or theunmethylated form of the CpG dinucleotide containing gene, and a 3′primer specific to the gene containing the CpG dinucleotide, therebyobtaining a first PCR product; amplifying the first PCR product withdifferentially labeled 5′ primers specific for each of the methylatedand the unmethylated form of the CpG dinucleotide sequence containinggene, and a common 3′ primer, thereby obtaining a differentially labeledsecond PCR product, and hybridizing the second PCR product to the CpGdinucleotide containing gene for measuring amount of the second PCRproduct, thereby detecting the methylation status of the CpGdinucleotide sequence. 50-51. (canceled)
 52. The method according toclaim 47, wherein the plurality of probes on the array hybridizes with arespective plurality of 118 different sequences occurring in theplurality of genes.
 53. The method according to claim 52, wherein eachprobe on the array is complementary to at least one nucleotide sequenceselected from the group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ IDNO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ IDNO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ IDNO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ IDNO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ IDNO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ IDNO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ IDNO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ IDNO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ IDNO: 54, SEQ ID NO:55, SEQ ID NO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ IDNO:59, SEQ ID NO:60, SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ IDNO:64, SEQ ID NO:65, SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ IDNO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ IDNO: 74, SEQ ID NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ IDNO:79, SEQ ID NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ IDNO:84, SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ IDNO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ IDNO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119, SEQ ID NO:120, SEQ IDNO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID NO:124, SEQ ID NO:125, SEQID NO:126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID NO:129, SEQ ID NO:130,SEQ ID NO:131, SEQ ID NO:132, SEQ ID NO:133, SEQ ID NO:134, SEQ IDNO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQ ID NO:138, SEQ ID NO:139, andSEQ ID NO:140.
 54. The method according to claim 47, wherein the diseasecondition assessed is selected from: an autoimmune disease, an allergy,a transplant rejection, obesity, an inherited disease, and a cancer.55-58. (canceled)
 59. A kit for estimating proportions of leukocytetypes in a sample from a subject by analyzing differential methylationof CpG dinucleotides in a plurality of genes of the sample, the kitcomprising: an array comprising: a plurality of DNA probes attached to aplurality of surfaces at known addressable locations on the array,wherein the surface at each location is attached to a DNA probe having aspecific nucleotide sequence, wherein the DNA probe having the specificnucleotide sequence hybridizes to a DNA sequence of a methylated form oran ummethylated form of a CpG dinucleotide in a sequence of a gene ofthe plurality of genes in the sample, wherein the array is selected fromhaving: at least 16 probes, at least 64 probes, at least 96 probes, andat least 384 probes; primers and reagents for detecting the hybridizedprobes and for detecting the reaction products derived from thehybridized probes; and instructions for using the array with a bisulfatereagent, thereby providing an estimation of proportions of leukocytetypes in the sample.
 60. (canceled)
 61. The kit according to claim 59wherein, the probes have nucleotide sequences complementary to at leastone selected from the group of: SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3,SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ IDNO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ IDNO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ IDNO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ IDNO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ IDNO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ IDNO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ IDNO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ IDNO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ IDNO: 54, SEQ ID NO:55, SEQ ID NO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ IDNO:59, SEQ ID NO:60, SEQ ID NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ IDNO:64, SEQ ID NO:65, SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ IDNO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ IDNO: 74, SEQ ID NO:75, SEQ ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ IDNO:79, SEQ ID NO:80, SEQ ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ IDNO:84, SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ IDNO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, SEQ IDNO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ ID NO:119, SEQ ID NO:120, SEQ IDNO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID NO:124, SEQ ID NO:125, SEQID NO:126, SEQ ID NO:127, SEQ ID NO:128, SEQ ID NO:129, SEQ ID NO:130,SEQ ID NO:131, SEQ ID NO:132, SEQ ID NO:133, SEQ ID NO:134, SEQ IDNO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQ Ill NO:138, SEQ ID NO:139,and SEQ ID NO:140. 62-65. (canceled)
 66. A method of treating a subjectfor a disease condition, wherein the subject is a human patient andwherein the disease condition is a cancer, the method comprising:obtaining signatures comprising differentially methylated regions (DMRs)from types of leukocytes in a blood sample of the patient, the types ofleukocytes comprising at least one selected from: CD19+ B lymphocyte,CD15+ granulocyte, CD14+ monocyte, CD56^(dim) Natural Killer cell,CD56^(bright) Natural Killer cell, and CD3+ T lymphocyte; and from ahealthy control human subject not having the cancer; comparing asignature for a specific type of leukocyte in the patient with that inthe healthy subject, wherein the signature for the specific type ofleukocyte is an indication of amount of cells of the specific type ofleukocyte circulating in blood, and wherein a decreased amount of thecells of the specific type of leukocyte circulating in the blood of thepatient compared to the healthy subject is an indicium of the cancer;and, administering a composition comprising the cells of the type ofleukocyte to the patient, thereby increasing the amount of the cells ofthe type of leukocyte in the patient and treating the cancer.
 67. Themethod according to claim 66, wherein the leukocyte type cell is theCD56^(dim) Natural Killer cell. 68-69. (canceled)
 70. The methodaccording to claim 67, wherein the DMR signature specific for CD56^(dim)Natural Killer cells comprises a CpG dinucleotide in a region near thepromoter of the gene NKp46, wherein the methylation status of the CpGdinucleotide is quantified by methylation specific quantitativepolymerase chain reaction (MS-qPCR) using primers and probes having SEQID NOs: 116-118 and 97-99.
 71. The method according to claim 67, whereinthe DMR signature specific for CD56^(dim) Natural Killer cells is a CpGdinucleotide in a region near the promoter of the gene NKp46, whereinthe methylation status of the CpG dinucleotide is quantified by digitalPCR comprising emulsion and nanofluidic partitioning using primers andprobes having SEQ ID NOs: 116-118 and 97-99. 72-73. (canceled)
 74. Themethod according to claim 66, wherein the signature comprises at leastone gene or locus selected from the group consisting of: SEQ ID NO:1,SEQ ID NO:2, SEQ NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ IDNO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ IDNO:12, SEQ ID NO:13, SEQ ID NO: 14, SEQ ID NO:15, SEQ ID NO:16, SEQ IDNO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ IDNO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ IDNO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ IDNO:32, SEQ ID NO:33, SEQ ID NO: 34, SEQ ID NO:35, SEQ ID NO:36, SEQ IDNO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ IDNO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ IDNO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ IDNO:52, SEQ ID NO:53, SEQ ID NO: 54, SEQ ID NO:55, SEQ ID NO:56, SEQ IDNO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60, SEQ ID NO:61, SEQ IDNO:62, SEQ ID NO:63, SEQ ID NO:64, SEQ ID NO:65, SEQ ID NO:66, SEQ IDNO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ ID NO:71, SEQ IDNO:72, SEQ ID NO:73, SEQ ID NO: 74, SEQ ID NO:75, SEQ ID NO:76, SEQ IDNO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID NO:80, SEQ ID NO:81, SEQ IDNO:82, SEQ ID NO:83, SEQ ID NO:84, SEQ ID NO:85, SEQ ID NO:86, SEQ IDNO:87, SEQ ID NO:88, SEQ ID NO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ IDNO:92, SEQ ID NO:93, SEQ ID NO: 94, SEQ ID NO:95, SEQ ID NO:96, SEQ IDNO:119, SEQ ID NO:120, SEQ ID NO:121, SEQ ID NO:122, SEQ ID NO:123, SEQID NO:124, SEQ ID NO:125, SEQ ID NO:126, SEQ ID NO:127, SEQ ID NO:128,SEQ ID NO:129, SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:132, SEQ IDNO:133, SEQ ID NO:134, SEQ ID NO:135, SEQ ID NO: 136, SEQ ID NO:137, SEQID NO:138, SEQ ID NO:139, and SEQ ID NO:140.
 75. The method according toclaim 74, wherein the at least one gene or locus is selected from thegroup consisting of: FGD2, HLA-DOB, BLK, IGSF6, CLDN15, SFT2D3, ZNF22,CEL, HDC, GSG1, FCN1, OSBPL5, LDB2, NCR1, EPS8L3, CD3D, PPP6C, CD3G,TXK, and FAIM.
 76. The method according to claim 74, wherein the atleast one gene or locus is selected from the group consisting of: CLEC9A(2 loci), INPP5D, INHBE, UNQ473, SLC7A11, ZNF22, XYLB, HDC, RGR,SLCO2B1, C1orf54, TM4SF19, IGSF6, KRTHA6, CCL21, SLC11A1, FGD2, TCL1A,MGMT, CD19, LILRB4, VPREB3, FLJ10379, HLA-DOB, EPS8L3, SHANK1, CD3D (2loci), CHRNA3, CD3G (2 loci), RARA, and GRASP.